Serve a model over HTTP
The payoff tutorial: train a model, export its weights, load them into a serez-http server, and expose predictions as a JSON API — then ship it as a Docker image anyone can call. Two small libraries, one real product.
save(), loading it at server startup, wiring a /predict endpoint, and deploying the whole thing.Step 1 — Train and export
First, a training script. This reuses the XOR network from the neural network tutorial. The key step is the last line — save() writes the learned weights to a file you can ship:
// train.sz
import "serez-ai"
Random.seed(42)
let X = [[0.0, 0.0], [0.0, 1.0], [1.0, 0.0], [1.0, 1.0]]
let y = [[0.0], [1.0], [1.0], [0.0]]
let model = new Sequential()
model.add(new Dense(2, 8, "relu"))
model.add(new Dense(8, 1, "sigmoid"))
model.fit_opt(X, y, new BCE(), 2000, new Adam(0.05, 0.9, 0.999))
model.save("xor.weights") // ← the exported model
out "saved xor.weights"mkdir xor-service
cd xor-service
sz init --y
sz install serez-ai
# put train.sz here, then:
sz train.szStep 2 — Load the model in a server
Now the server. Install serez-http, then in index.sz rebuild the same architecture and load() the weights once at startup — not per request:
sz install serez-http// index.sz
import "serez-ai"
import "serez-http"
// Rebuild the architecture, then load the trained weights
let model = new Sequential()
model.add(new Dense(2, 8, "relu"))
model.add(new Dense(8, 1, "sigmoid"))
model.load("xor.weights")
const app = new App()Step 3 — Expose a /predict endpoint
Parse the JSON body, run forward, return the prediction. This is the bridge between serez-ai and serez-http:
app.POST("/predict", fn(req, res) {
let body = JSON.parse(req["body"])
let a = body["a"]
let b = body["b"]
if (a == null || b == null) {
res.status(400).json({"error": "send {a, b} as numbers"})
} else {
let pred = model.forward([[a, b]])[0][0]
res.json({"input": [a, b], "prediction": pred, "rounded": Math.round(pred)})
}
})
// A health check is good practice for any deployed service
app.GET("/health", fn(req, res) {
res.json({"status": "ok", "model": "xor"})
})
app.listen(3000, fn() { out "model server on http://127.0.0.1:3000" })Step 4 — Call it
Start the server and send it a request:
sz run dev
# elsewhere:
curl -X POST http://127.0.0.1:3000/predict \
-H "Content-Type: application/json" \
-d '{"a": 1, "b": 0}'
# → {"input":[1,0],"prediction":0.97...,"rounded":1}That's a trained neural network answering live HTTP requests — the same shape as a real inference service, just smaller.
Step 5 — Ship it with Docker
Bundle the server, the weights file, and the runtime into one image with serez-apipack. Add the dependency and scripts to serez.json — and make sure xor.weights sits next to index.sz so it gets copied in:
{
"name": "xor-service",
"version": "1.0.0",
"main": "index.sz",
"scripts": {
"dev": "sz index.sz",
"build": "sz pack.sz"
},
"dependencies": {
"serez-ai": "1.0.2",
"serez-http": "1.0.0",
"serez-apipack": "1.0.0"
}
}sz install # installs all deps from serez.json
sz run build # builds xor-service:1.0.0
docker run -p 3000:3000 xor-service:1.0.0Anyone with Docker can now run your model as a service — no Serez Code, no Python, no framework install. A 30-line script became a deployable product.
Beyond XOR
The exact same pattern serves real models — swap the architecture and weights:
- An image classifier (
Conv2Dlayers) behindPOST /classify. - A GPT model behind
POST /generatefor a text API. - Add the auth + rate-limiting middleware from the REST API tutorial to protect your endpoint.