AI Tools & Infrastructure for Developers

8 articles

Hub + 7 spokes

Series Overview

Compare AI infrastructure options and build production AI tools: self-hosted models vs API providers, LLM inference engines, OCR, vector sync, and MCP integrations.

All Articles in Series

HUB1 of 8

Self-Hosted AI vs API Providers: Decision Framework

Compare costs, compliance, and hybrid serverless GPU options to pick the right AI infrastructure for your business

Self-hosted AI vs API providers: a practical framework to weigh cost, compliance, and ops risk so you can pick the right infrastructure and avoid costly…

10 min readFeb 18, 2026By Matija Žiberna

PART 12 of 8

AI App Planning: Proven Workflow to Keep Plans Coherent

Reproducible process using chat AI for ideation, local AI for surgical edits, and git to prevent context drift.

Learn an actionable AI app planning workflow that keeps plans coherent using chat ideation, local AI surgical edits, and git version control for auditability.

11 min readDec 8, 2025By Matija Žiberna

PART 23 of 8

Deterministic Upstash Vector Sync: Atomic CMS Indexing

Step-by-step guide to build a deterministic vector database sync with Upstash Vector, OpenAI embeddings, chunking, and…

Learn a deterministic Upstash Vector sync pipeline to keep your CMS and vector database perfectly in sync — atomic updates, chunking, OpenAI embeddings, and…

9 min readDec 19, 2025By Matija Žiberna

PART 34 of 8

Build a Claude SEO Agent with Google Search Console MCP Integration

Connect Claude to Google Search Console API via MCP for live SEO diagnostics, URL inspection, and AI-powered ranking analysis without leaving your IDE.

Implement a Google Search Console integration with Claude using MCP: run URL inspections, pull analytics, and build an AI assistant that diagnoses ranking…

6 min readDec 22, 2025By Matija Žiberna

PART 45 of 8

Ultimate Guide: Run GLM-OCR Locally on MacBook Fast

Step-by-step Ollama setup for GLM-OCR on macOS — pull the model, set num_ctx=16384, and run a local OpenAI‑compatible…

Run GLM-OCR locally on your MacBook with Ollama—install, pull the 0.9B model, set num_ctx=16384 to avoid crashes, and run a local OpenAI‑compatible OCR API in…

6 min readFeb 10, 2026By Matija Žiberna

PART 56 of 8

Run GLM-OCR on RunPod Serverless: 17-line Dockerfile

Custom Dockerfile with Transformers v5 and pre-baked GLM-OCR weights for fast RunPod serverless cold starts

Learn how to run GLM-OCR on RunPod Serverless with a custom Dockerfile that pre-installs Transformers v5 and bakes model weights to eliminate cold starts.

7 min readFeb 9, 2026By Matija Žiberna

PART 67 of 8

LLM Inference Engine Showdown: vLLM vs Ollama vs TGI

Benchmark-backed guide comparing vLLM, Ollama, and TGI — throughput, concurrency, scaling, and observability to choose…

Decide the best LLM inference engine for your deployment with benchmarked throughput, latency, and scaling guidance—clear recommendations for vLLM, Ollama…

10 min readFeb 7, 2026By Matija Žiberna

PART 78 of 8

Zod v4 & Gemini: Fix Structured Output with z.toJSONSchema

Stop using zod-to-json-schema—use Zod v4's native z.toJSONSchema to enforce Gemini structured output reliably.

Fix Gemini structured output by switching to Zod v4's z.toJSONSchema. Replace zod-to-json-schema and produce valid JSON schemas so AI returns correct fields.

6 min readDec 24, 2025By Matija Žiberna

Topics Covered

AI infrastructureself-hosted AILLM inferencevLLMOllamaRunPodvector searchMCP integration

Back to all series

AI Tools & Infrastructure for Developers

Series Overview

All Articles in Series

Self-Hosted AI vs API Providers: Decision Framework

AI App Planning: Proven Workflow to Keep Plans Coherent

Deterministic Upstash Vector Sync: Atomic CMS Indexing

Build a Claude SEO Agent with Google Search Console MCP Integration

Ultimate Guide: Run GLM-OCR Locally on MacBook Fast

Run GLM-OCR on RunPod Serverless: 17-line Dockerfile

LLM Inference Engine Showdown: vLLM vs Ollama vs TGI

Zod v4 & Gemini: Fix Structured Output with z.toJSONSchema

Topics Covered

B2B Website Development

Bespoke AI Applications

Start a conversation.

Resources

Headless CMS

Get in Touch