Skip to main content

Portfolio

Projects

Back Home
LLM Integration Engine screenshot 1LLM Integration Engine screenshot 2LLM Integration Engine screenshot 3

LLM Integration Engine

A modular orchestration layer that routes prompts between cloud APIs and local models based on latency, cost, and privacy constraints. It normalizes provider payloads, adds retry-aware fallbacks, and streams structured responses to the UI. Local inference is handled through llama.cpp for offline and sensitive workflows, while external API integrations are used for high-context tasks and rapid model updates.

TypeScriptNext.jsNode.jsOpenAI APIAnthropic APIllama.cppSSE StreamingPrompt Routing
Realtime Support Copilot screenshot 1Realtime Support Copilot screenshot 2Realtime Support Copilot screenshot 3

Realtime Support Copilot

An agentic support assistant that classifies incoming tickets, retrieves product knowledge, and drafts responses with confidence scoring. The system streams partial answers to operators, escalates low-confidence outputs, and logs every model decision for auditability.

Next.jsTypeScriptPostgreSQLRedisRAGFunction CallingWebhooksRate Limiting
Edge Inference Studio screenshot 1Edge Inference Studio screenshot 2Edge Inference Studio screenshot 3

Edge Inference Studio

A local-first playground for benchmarking quantized models with llama.cpp across CPU and GPU profiles. It provides prompt templates, token throughput dashboards, and profile presets so teams can choose the right local model strategy before shipping.

ElectronTypeScriptllama.cppSQLiteWebAssemblyBenchmarkingIPCData Visualization