serez-agentai
GPT-style transformer stack and agentic framework for Serez Code — tokenizer, causal attention, KV-cache, sampling strategies, tool calling, episodic memory, and an agent loop.
Install
sz install serez-agentaiserez-agentai depends on serez-ai for base layers (Dense, LayerNorm, Dropout, etc.). Both are installed automatically.
Tokenizer
CharTokenizer is a character-level tokenizer. Special tokens [PAD]=0, [BOS]=1, [EOS]=2, [UNK]=3 are always present.
import "serez-agentai"
let tok = new CharTokenizer()
tok.buildVocab("hello world") // adds unique chars to vocab
let ids = tok.encode("hello") // [1, 4, 5, 6, 6, 7, 2] (BOS … EOS)
let text = tok.decode(ids) // "hello"
out tok.vocab_size // number of unique tokensPositional encoding
Sinusoidal positional encoding — returns a [seq_len, d_model] tensor added to token embeddings before the first transformer block.
let pe = sinusoidalPE(512, 256) // max_seq=512, d_model=256
// pe.shape() → [512, 256]GPT model
GPTModel is a decoder-only transformer (GPT-style) with causal masking, pre-norm blocks, and a linear output head.
let model = new GPTModel(
tok.vocab_size, // vocab_size
128, // d_model
4, // n_heads
512, // d_ff (feed-forward inner dim)
6, // n_layers
256 // max_seq
)
// Forward pass: array of token IDs → logits [seq, vocab_size]
let ids = tok.encode("hello")
let logits = model.forward(ids)
// Update weights (SGD step after Autodiff.backward)
Autodiff.tape()
let logits = model.forward(ids)
let loss = Autodiff.crossEntropyLoss(logits, target_ids, seq, tok.vocab_size)
Autodiff.backward(loss)
model.update(0.001)Sampling & generation
| Function | Description |
|---|---|
greedy(logits) | Picks the token with the highest logit |
sampleTemp(logits, t) | Temperature sampling — t < 1 sharpens, t > 1 flattens |
topK(logits, k) | Samples from the top-k most probable tokens |
topP(logits, p) | Nucleus sampling — keeps the smallest set summing to p |
generate(model, tok, prompt, max_tokens, strategy, temp, k, p) | Full generation loop — returns decoded string |
// Greedy generation
let response = generate(model, tok, "hello", 50, "greedy", 1.0, 40, 0.9)
// Top-p (nucleus) generation
let response = generate(model, tok, "hello", 100, "topp", 0.8, 40, 0.9)KV-cache
Stores computed Keys and Values across generation steps to avoid recomputing past context on every new token.
let cache = new KVCache(6, 4) // n_layers=6, n_heads=4
cache.store(0, 0, k_tensor, v_tensor)
let entry = cache.fetch(0, 0) // {k: [...], v: [...]}
cache.reset() // clear for new contextLR schedulers
// Warmup + cosine decay
let sched = new WarmupCosineScheduler(0.001, 100, 1000)
let lr = sched.step(t) // current learning rate at step t
// Warmup + linear decay
let sched = new WarmupLinearScheduler(0.001, 100, 1000)
let lr = sched.step(t)Additional losses
| Function | Use for |
|---|---|
klDivLoss(log_p, q) | KL divergence — knowledge distillation |
focalLoss(logits, targets, γ, α) | Imbalanced classification |
contrastiveLoss(anchor, pos, neg, m) | Embedding / metric learning |
Tool calling
Register tools with a ToolRegistry. The agent parses [TOOL:name|key=val] tokens from the model output and dispatches the call automatically.
let registry = new ToolRegistry()
registry.register(new Tool("weather", "Get current weather for a city",
fn(args) {
return "Sunny, 22°C in " + args
}
))
registry.register(new Tool("calculator", "Evaluate a math expression",
fn(args) {
return "42"
}
))
out registry.describe()
// Available tools:
// - weather: Get current weather for a city
// - calculator: Evaluate a math expressionEpisodic memory
let mem = new EpisodicMemory(1000) // capacity = 1000 episodes
mem.store("what is the capital of France?", "Paris")
mem.store("who wrote Hamlet?", "Shakespeare")
// Keyword search — returns array of [context, response] pairs
let results = mem.search("France capital", 3)
out results[0][0] // context
out results[0][1] // response
let recent = mem.recent(5) // last 5 episodes
mem.clear()Agent loop
The Agent class combines model, tokenizer, tools, and memory into a perception → reasoning → action → observation loop.
let cfg = {}
cfg["max_turns"] = 5
cfg["max_tokens"] = 200
cfg["strategy"] = "greedy"
cfg["temperature"] = 1.0
let agent = new Agent(model, tok, registry, mem, cfg)
let response = agent.run("What is the weather in Paris?")
out response
agent.reset() // clear context for a new conversationIf the model outputs a [TOOL:name|args] token, the agent calls the tool, appends the observation to the context, and continues generating. Otherwise it returns the response directly.
DataLoader
let seqs = [ids_1, ids_2, ids_3, ...] // array of token-ID arrays
let dl = new AgentDataLoader(seqs, 32) // batch_size = 32
dl.shuffle()
let batches = dl.batches()
for (let b in batches) {
// train on batch b
}