Package

serez-agentai

GPT-style transformer stack and agentic framework for Serez Code — tokenizer, causal attention, KV-cache, sampling strategies, tool calling, episodic memory, and an agent loop.

Install

sz install serez-agentai

serez-agentai depends on serez-ai for base layers (Dense, LayerNorm, Dropout, etc.). Both are installed automatically.

Tokenizer

CharTokenizer is a character-level tokenizer. Special tokens [PAD]=0, [BOS]=1, [EOS]=2, [UNK]=3 are always present.

import "serez-agentai"

let tok = new CharTokenizer()
tok.buildVocab("hello world")    // adds unique chars to vocab

let ids = tok.encode("hello")    // [1, 4, 5, 6, 6, 7, 2]  (BOS … EOS)
let text = tok.decode(ids)       // "hello"

out tok.vocab_size                // number of unique tokens

Positional encoding

Sinusoidal positional encoding — returns a [seq_len, d_model] tensor added to token embeddings before the first transformer block.

let pe = sinusoidalPE(512, 256)  // max_seq=512, d_model=256
// pe.shape() → [512, 256]

GPT model

GPTModel is a decoder-only transformer (GPT-style) with causal masking, pre-norm blocks, and a linear output head.

let model = new GPTModel(
    tok.vocab_size,  // vocab_size
    128,             // d_model
    4,               // n_heads
    512,             // d_ff  (feed-forward inner dim)
    6,               // n_layers
    256              // max_seq
)

// Forward pass: array of token IDs → logits [seq, vocab_size]
let ids = tok.encode("hello")
let logits = model.forward(ids)

// Update weights (SGD step after Autodiff.backward)
Autodiff.tape()
let logits = model.forward(ids)
let loss = Autodiff.crossEntropyLoss(logits, target_ids, seq, tok.vocab_size)
Autodiff.backward(loss)
model.update(0.001)

Sampling & generation

Function	Description
`greedy(logits)`	Picks the token with the highest logit
`sampleTemp(logits, t)`	Temperature sampling — t < 1 sharpens, t > 1 flattens
`topK(logits, k)`	Samples from the top-k most probable tokens
`topP(logits, p)`	Nucleus sampling — keeps the smallest set summing to p
`generate(model, tok, prompt, max_tokens, strategy, temp, k, p)`	Full generation loop — returns decoded string

// Greedy generation
let response = generate(model, tok, "hello", 50, "greedy", 1.0, 40, 0.9)

// Top-p (nucleus) generation
let response = generate(model, tok, "hello", 100, "topp", 0.8, 40, 0.9)

KV-cache

Stores computed Keys and Values across generation steps to avoid recomputing past context on every new token.

let cache = new KVCache(6, 4)    // n_layers=6, n_heads=4

cache.store(0, 0, k_tensor, v_tensor)
let entry = cache.fetch(0, 0)   // {k: [...], v: [...]}
cache.reset()                    // clear for new context

LR schedulers

// Warmup + cosine decay
let sched = new WarmupCosineScheduler(0.001, 100, 1000)
let lr = sched.step(t)   // current learning rate at step t

// Warmup + linear decay
let sched = new WarmupLinearScheduler(0.001, 100, 1000)
let lr = sched.step(t)

Additional losses

Function	Use for
`klDivLoss(log_p, q)`	KL divergence — knowledge distillation
`focalLoss(logits, targets, γ, α)`	Imbalanced classification
`contrastiveLoss(anchor, pos, neg, m)`	Embedding / metric learning

Tool calling

Register tools with a ToolRegistry. The agent parses [TOOL:name|key=val] tokens from the model output and dispatches the call automatically.

let registry = new ToolRegistry()

registry.register(new Tool("weather", "Get current weather for a city",
    fn(args) {
        return "Sunny, 22°C in " + args
    }
))

registry.register(new Tool("calculator", "Evaluate a math expression",
    fn(args) {
        return "42"
    }
))

out registry.describe()
// Available tools:
//   - weather: Get current weather for a city
//   - calculator: Evaluate a math expression

Episodic memory

let mem = new EpisodicMemory(1000)  // capacity = 1000 episodes

mem.store("what is the capital of France?", "Paris")
mem.store("who wrote Hamlet?", "Shakespeare")

// Keyword search — returns array of [context, response] pairs
let results = mem.search("France capital", 3)
out results[0][0]   // context
out results[0][1]   // response

let recent = mem.recent(5)  // last 5 episodes
mem.clear()

Agent loop

The Agent class combines model, tokenizer, tools, and memory into a perception → reasoning → action → observation loop.

// config is a typed dict — build it with ({"key", value}) entries
let cfg <string, any> = ({"max_turns", 5}, {"max_tokens", 200}, {"strategy", "greedy"}, {"temperature", 1.0})

let agent = new Agent(model, tok, registry, mem, cfg)

let response = agent.run("What is the weather in Paris?")
out response

agent.reset()  // clear context for a new conversation

If the model outputs a [TOOL:name|args] token, the agent calls the tool, appends the observation to the context, and continues generating. Otherwise it returns the response directly.

DataLoader

let seqs = [ids_1, ids_2, ids_3, ...]     // array of token-ID arrays
let dl = new AgentDataLoader(seqs, 32)   // batch_size = 32

dl.shuffle()
let batches = dl.batches()

for (let b in batches) {
    // train on batch b
}