Guide

Serve a model over HTTP

The payoff tutorial: train a model, export its weights, load them into a serez-http server, and expose predictions as a JSON API — then ship it as a Docker image anyone can call. Two small libraries, one real product.

What you'll learn: exporting a trained model with save(), loading it at server startup, wiring a /predict endpoint, and deploying the whole thing.

Step 1 — Train and export

First, a training script. This reuses the XOR network from the neural network tutorial. The key step is the last line — save() writes the learned weights to a file you can ship:

// train.sz
import "serez-ai"

Random.seed(42)

let X = Tensor.from([[0.0, 0.0], [0.0, 1.0], [1.0, 0.0], [1.0, 1.0]])
let y = Tensor.from([[0.0],      [1.0],      [1.0],      [0.0]])

let model = new Sequential()
model.add(new Dense(2, 8, "relu"))
model.add(new Dense(8, 1, "sigmoid"))

model.fit_opt(X, y, new BCE(), 2000, new Adam(0.05, 0.9, 0.999), false)

model.save("xor.weights")   // ← the exported model
out "saved xor.weights"

mkdir xor-service
cd xor-service
sz init --y
sz install serez-ai
# put train.sz here, then:
sz train.sz

Step 2 — Load the model in a server

Now the server. Install serez-http, then in index.sz rebuild the same architecture and load() the weights once at startup — not per request:

sz install serez-http

// index.sz
import "serez-ai"
import "serez-http"

// Rebuild the architecture, then load the trained weights
let model = new Sequential()
model.add(new Dense(2, 8, "relu"))
model.add(new Dense(8, 1, "sigmoid"))
model.load("xor.weights")

const app = new App()

Why load once: loading weights is expensive; doing it at startup means every request just runs a fast forward pass. The model lives in a closure the handlers can see.

Step 3 — Expose a /predict endpoint

Parse the JSON body, run forward, return the prediction. This is the bridge between serez-ai and serez-http:

app.POST("/predict", fn(req, res) {
    let body = JSON.parse(req["body"])
    let a = body["a"]
    let b = body["b"]
    if (a == null || b == null) {
        // dicts are typed: <string, any> with ({"key", value}) entries
        let err <string, any> = ({"error", "send a and b as numbers"})
        res.status(400).json(err)
    } else {
        // forward takes a tensor; read the scalar with .get(row, col)
        let pred = model.forward(Tensor.from([[a, b]])).get(0, 0)
        let resp <string, any> = ({"input", [a, b]}, {"prediction", pred}, {"rounded", Math.round(pred)})
        res.json(resp)
    }
})

// A health check is good practice for any deployed service
app.GET("/health", fn(req, res) {
    let health <string, any> = ({"status", "ok"}, {"model", "xor"})
    res.json(health)
})

app.listen(3000, fn() { out "model server on http://127.0.0.1:3000" })

Step 4 — Call it

Start the server and send it a request:

sz run dev

# elsewhere:
curl -X POST http://127.0.0.1:3000/predict \
  -H "Content-Type: application/json" \
  -d '{"a": 1, "b": 0}'

# → {"input":[1,0],"prediction":0.97...,"rounded":1}

That's a trained neural network answering live HTTP requests — the same shape as a real inference service, just smaller.

Step 5 — Ship it with Docker

Bundle the server, the weights file, and the runtime into one image with serez-apipack. Add the dependency and scripts to serez.json — and make sure xor.weights sits next to index.sz so it gets copied in:

{
  "name": "xor-service",
  "version": "1.0.0",
  "main": "index.sz",
  "scripts": {
    "dev": "sz index.sz",
    "build": "sz run apipack tag=xor-service:1.0.0"
  },
  "dependencies": {
    "serez-ai": "1.0.5",
    "serez-http": "1.0.4",
    "serez-apipack": "1.1.4"
  }
}

sz install                         # installs all deps from serez.json
sz run build                       # builds xor-service:1.0.0
docker run -p 3000:3000 xor-service:1.0.0

Anyone with Docker can now run your model as a service — no Serez Code, no Python, no framework install. A 30-line script became a deployable product.

Beyond XOR

The exact same pattern serves real models — swap the architecture and weights:

An image classifier (Conv2D layers) behind POST /classify.
A GPT model behind POST /generate for a text API.
Add the auth middleware and app.rateLimitRoute from the REST API tutorial to protect your endpoint.