ML APIs & Real-Time Serving

Overview
Pain addressed: models exist only in notebooks, there is no stable contract for apps or partners, and failures are impossible to debug in production.
What you receive: API code (commonly FastAPI-style), container or deploy notes you approve, inference contract document, artefact manifest, and rollback guidance you can rehearse.
In scope: synchronous scoring or similar endpoints you define, explicit payload limits, structured error paths, acceptance tests written before final sign-off.
Out of scope: open-ended “chat with all our documents” builds without written corpus limits, anonymous row-level large-language-model calls on raw PII, or unlimited on-call operations without a retainer.
Optional (separate written module): bounded document Q&A over files you control—only when citations, corpus boundaries, evaluation steps, infrastructure owner, and phased acceptance are agreed up front. Otherwise the engagement stays on scoring APIs and batch handoffs.
Outcome: a serving layer your product or internal tools can call with explicit expectations. Reference repositories in Portfolio illustrate patterns; your project follows your infrastructure and compliance rules.