Portfolio
Projects
LLM Integration Engine
A modular orchestration layer that routes prompts between cloud APIs and local models based on latency, cost, and privacy constraints. It normalizes provider payloads, adds retry-aware fallbacks, and streams structured responses to the UI. Local inference is handled through llama.cpp for offline and sensitive workflows, while external API integrations are used for high-context tasks and rapid model updates.
Realtime Support Copilot
An agentic support assistant that classifies incoming tickets, retrieves product knowledge, and drafts responses with confidence scoring. The system streams partial answers to operators, escalates low-confidence outputs, and logs every model decision for auditability.
Edge Inference Studio
A local-first playground for benchmarking quantized models with llama.cpp across CPU and GPU profiles. It provides prompt templates, token throughput dashboards, and profile presets so teams can choose the right local model strategy before shipping.