VIA: A Path into AI-Human Collaboration

I was the Product Designer/Manager and Lead AI/Backend Engineer on this project. This is my fork as I continue to develop the project.

Here I will detail the initial phase we completed as part of the AI Dallas Summer Program 2025. I plan to continue to produce more articles describing my experiences as I develop this project further.

Table of Contents

Introduction
The Real World Problem
Our Solution Approach
Product Architecture
The Four Core Agents
Agent Demo
Demo Walkthrough
Technology Stack Deep Dive
Key Design Challenges
Future Roadmap
Lessons Learned
Discussion
Source Code

Introduction

During the Dallas AI Summer Program 2025, our team embarked on an ambitious eight-week journey to solve one of the most frustrating aspects of modern digital interactions: the breakdown between chatbots and humans. What started as a simple observation about user frustration evolved into VIA, a comprehensive platform that reimagines how AI and human intelligence can work together.

"A Human-in-the-loop concept that enhances other existing products."

"AI improving human performance, not replacing it."

"Improving employee experience, not just the customer's."

— Our mentor's goals and philosophy for this project

This page provides an overview of building VIA, exploring our technical decisions, architectural challenges, and the lessons learned while developing a Human-in-the-Loop (HITL) system that actually works.

The Real World Problem

🚨 The Frustration Loop

Humans get frustrated when interacting with AIs, especially when they are forced to use chatbots for support. Current systems lack intelligent escalation pathways and fail to learn from human expertise, creating a cycle of poor experiences.

Our research revealed several critical pain points in existing chatbot implementations:

64%

Customers don't want companies to use AI at all for customer support [Gartner]

77%

Users frustrated with customer service chatbots [Ipsos]

88%

Prefer a human for customer support [Ipsos]

These statistics pointed to a fundamental flaw in how we think about AI-human collaboration. Instead of viewing humans as fallbacks for AI failures, we realized they should be integral partners in the intelligence pipeline.

Our Solution Approach

✅ The VIA Framework

VIA monitors chatbot-user interactions for warning signs, intelligently escalates to the "best" human to resolve issues, and incorporates resolutions into a feedback loop to improve both AI and human performance.

Our solution centers around three core principles:

1. Proactive Monitoring

Rather than waiting for explicit user complaints, VIA continuously analyzes conversation sentiment, response quality, and user behavior patterns to identify potential issues before they escalate.

2. Intelligent Human Routing

When escalation is needed, VIA doesn't just route to the next available agent. It considers expertise, workload, employee well-being, past performance with similar issues, and even time of day to find the optimal human match.

3. Continuous Learning Loop

Every interaction becomes a learning opportunity. Successful resolutions are analyzed and integrated back into the system, improving both AI responses and human agent preparation for future similar scenarios.

Product Architecture

                        🏗️
                        Architecture Philosophy
                    

We built VIA using modern agentic frameworks, specifically LangChain and LangGraph, to create a system where intelligent agents can collaborate seamlessly while maintaining clear separation of concerns.

🦜

LangChain & LangGraph

Orchestrates our multi-agent system with robust conversation management and state handling

🧠

Gemini Flash 2.5

Powers our prototype, but future plans involve higher-performing, specialized models.

🗃️

SQL + Future RAG

Stores interaction history with plans for intelligent knowledge retrieval

The system architecture follows a microservices approach where each agent operates independently but can communicate through a central orchestration layer. This design allows us to upgrade individual agents without affecting the entire system.

The Four Core Agents

VIA's intelligence comes from the coordination of four specialized agents, each with a specific role in the human-AI collaboration pipeline:

Frustration Agent

The Frustration Agent acts as our early warning system, continuously monitoring user inputs for signs of mounting frustration or dissatisfaction.

Current Implementation

Uses Gemini Flash 2.5 API with carefully crafted prompts to rate user frustration on a 1-10 scale, triggering escalation at configurable thresholds.

Future Vision

Specialized sentiment analysis models with multi-dimensional emotion detection, including subtle indicators like typing patterns and response time.

Quality Agent

Before any chatbot response reaches the user, the Quality Agent evaluates its adequacy and potential impact.

Current Implementation

Employs a "judge" pattern where Gemini Flash 2.5 evaluates responses from other models, comparing answers and ensuring consistency.

Future Vision

Hierarchical evaluation system with lightweight prescreening models determining when to engage more sophisticated (and expensive) evaluation models.

Routing Agent

When human intervention is required, the Routing Agent performs the critical task of selecting the optimal human agent.

🎯 Routing Factors

                        - Agent expertise and specialization areas

                        - Current workload and availability

                        - Historical performance with similar issues

                        - Employee wellbeing and stress levels

                        - User frustration level and urgency

                        - Time zone and language preferences

                        - Recent feedback and success rates

Current Implementation

Gemini Flash 2.5 scores each available employee across multiple weighted factors, selecting the highest-scoring match.

Future Vision

XGBoost ensemble model for rapid employee scoring, combined with neural networks for complex context integration and historical pattern recognition.

Context Manager Agent

The Context Manager ensures that both AI agents and human representatives have access to all relevant information for each interaction.

Current Implementation

Queries SQL database for user history, similar past interactions, and keyword-based searches.

Future Vision

Hybrid approach with SQL for user-specific history and RAG-powered knowledge base for rapid retrieval of similar scenarios and successful resolution strategies.

Agent Demo

Below are screenshots from our demo showing the agents in action. I created a simple Gradio app and hosted it on Hugging Face Spaces: VIA Agent Demo.

Demo Walkthrough

Let's walk through how each component of VIA works in practice, using screenshots from our technical demo.

Chat Window

Our concept is that this design can improve any chatbot, including pre-existing ones, and for this prototype we had Gemini pretend to be an Insurance Company support chatbot.

Frustration Analysis

This agent reviews the user input before it reaches the chatbot, looking for signs of frustration, which in this case was ALL CAPS, as you can see above in the chat window.

Gemini scores the frustration on a scale of 1 to 10, and if the threshold is exceeded (we use 7 in the config .yaml file), then escalation occurs.

Frustration score: 8.0/10 - Escalation triggered

Quality Analysis

The Quality Agent in a similar manner intercepts and reviews the chatbot's output back to the user, with Gemini rating it on various factors.

In this case, there is no quality analysis because the frustration agent escalated and pre-empted any response from the chatbot.

Context Management

The Context Manager is one of the keys to our system, but needs a lot of work. It currently relies on multiple SQL queries and is not very fast or efficient.

When complete, though, it will deliver relevant information to the other AI agents and to the human agents in addition to integrating key human insights back into the system in a continuous loop.

User interaction history
Similar resolved cases
Product knowledge integration

Intelligent Human Routing

When escalation is needed, the Routing Agent analyzes available human agents and selects the optimal match. This is another key feature to our system, and again, one that needs a lot of work.

In this demo example, agent Sarah Chen was selected based upon various factors that the model scored and any other context provided.

She has 5/5 experience and "escalations" are one of her areas of expertise, so she gets selected a lot in the demo for frustation escalations. This is something that needs be better modeled to balance the workload.

Agent Sarah: Score 8.7/10
• Specializations: Technical Support, Escalations
• Workload: Light (2 active chats)
• Experience Level: 5/5

Sharing the Context and Reasoning

Another key feature of our concept is that the reason behind decisions are integrated into the context management system and shared.

For example, when it is escalated to a human, the human gets to see why this query was escalated "Frustration Score: 8.0" and why they were chosen to receive this escalation along with transaction history, any relevant info from knowledge base, and other context.

Ideally, the finalized UI will have a convenient means to record feedback about these decisions and to provide other insights that will empower the employee to continuously improve the system.

Technology Stack Deep Dive

Our technology choices were driven by the need for rapid prototyping while maintaining a clear path to production scalability.

Component	Current Technology	Future Plans	Reasoning
Agent Framework	LangChain + LangGraph	Continue + Custom Orchestration	Popular, proven framework.
Language Models	Gemini Flash 2.5	Specialized Models + Ensembles	Easy to implement, fast and cost-effective for a foundation model
Data Storage	SQL Database	SQL + Vector Database (RAG)	Reliable for structured data, need semantic search
Monitoring	Custom Dashboard	Enhanced Analytics + ML Ops	Basic metrics covered, need deeper insights

Key Design Challenges

Challenge 1: Time Constraints and Summer Program Design

A group of five people (one dropped out near the end) with various backgrounds was put together under a mentor and had eight weeks to conceive, design, and implement an AI product. With varying levels of technical ability, engagement, and commitment, it took a long time for everyone to agree to a concept, cutting our build time down to less than 4 weeks.

Not only did we need to complete a working demo in this period, we also needed to create a promotional video and prepare a formal pitch (along with demo) to present before a live audience and judges of various technical backgrounds (kind of like Shark Tank style).

Shortcuts had to be taken, and tradeoffs managed.

Limited Technical Expertise

Only 2 team members had programming experience, and nobody had programmed agents or LLMs before.

No Evaluation Framework

We basically had to test the models with simple batch runs of inputs and eyeballing the results.

No Real Data or Simulation Environment

We used LLMs to create mock data in a piecemeal fashion but did not have time to run simulations of the entire system working together.

Incomplete Implementation

We just ran out of time. Our frontend did not get connected to the backend. Our models are way too slow and need more tuning. Everything just needs more work.

🔬 Planned Solutions

                        • I plan to continue finishing out the project and detail my experiences here

                        • The sections below lay out the plan

Challenge 2: Competitive Differentiation

The core concepts of intelligent chatbot escalation and context integration could be easily replicated by competitors. Our differentiation must come from better execution, superior intelligence, and great user experience. Heavy use of foundation models provides no moat.

Our approach to this challenge focuses on two key areas:

Superior User Experience

Fast, seamless agent response times and effective interfaces for humans to both provide and receive context. Dashboards and embedded evaluation frameworks to effectively manage and tune the AI-human hybrid team.

Advanced Intelligence

Specialized models fine-tuned for specific aspects of customer service interactions, including effective ensembling of different models to better integrate context into every interaction, as well as a state of the art context managment system.

🔬 Planned Solutions

                        • Comprehensive evaluation framework to tune each model and the overall system

                        • UI Improvements

                        • Focus on more intelligent design and implementation of the escalation engine and context management engine

                        • Faster performance times (see Challenge 3 below)

Challenge 3: Real-time Performance

Customer service interactions happen in real-time. Any delays in our analysis or routing can negatively impact the user experience we're trying to improve.

Currently, all of our agents are probably too slow for real deployment; in particular, the SQL-based context retrieval in its current form is too slow for production use. We're addressing this through a multi-tiered approach:

Performance Optimization Strategy

Specialized, faster models and adding RAG capabilities to the context management system.

🔬 Planned Solutions

                        • Specialized sentiment models to detect frustration

                        • Fast pre-screener models to detect quality issues before calling slower more robust models for consensus checks

                        • XGBoost model for fast ranking of human agent roster to each scenario

                        • Combining RAG for knowledge searches with SQL for recent activity searches

Challenge 4: Model Cost and Scalability

Using a foundation LLM for every interaction creates scaling challenges. As conversation volume grows, costs could become prohibitive, and latency already is a problem.

Our solution involves a graduated approach to model usage:

💰 Cost Optimization Hierarchy

                        1. Lightweight, locally deployed models for sentiment, pre-screening quality, roster ranking, and context retrieval 

                        2. More robust models where necessary, such as final quality evaluation

                        3. Premium models only for complex edge cases

Future Roadmap

Our development roadmap is organized into three phases, each building on the lessons learned from the previous iteration:

Phase 1: Finish Basic Prototype

                        🔗
                        Connect Frontend and Deploy to Cloud Provider
                    

We need something more than the rudimentary Gradio demo to show off our system.

Integrate FlashAPI to serve agents
Connect to basic React frontend
Deploy databases, agent workflows, and frontend to a cloud provider like AWS

Phase 2: Evaluation Framework

                        🧠
                        Iterate and Experiment Faster
                    

Especially with non-deterministic systems, the key to improvement and tuning is to rapidly iterate and experiment. We need a framework to make this happen.

Implement a pre-deployment eval framework to select and tune each model
Integrate the framework to evaluate post-deployment system changes
Tune the existing models for optimum speed

Phase 3: Context Management Performance

                        🚀
                        Improve Speed and Intelligence
                    

Context searches are too slow for real production and are lacking in providing key information

Integrate RAG/Vector Database for Knowledge Base searches
Optimize SQL strategy and implementation
Create a system to derive key information from SQL and transfer to RAG on a regular basis
Acquire and/or simulate sufficient data for proper evaluation

Phase 4: Improved Models

                        🏢
                        Faster, Smarter, and Cheaper
                    

Product not viable with significant improvements in performance, particularly speed/latency. There is no "moat" with foundation models.

Fast local models
Ensemble models for improved intelligence
Use eval framework to select and tune models and optimize complete system

Technical Architecture Evolution

Component	Phase 1	Phase 2	Phase 3
Frustration Detection	Specialized sentiment models	Multi-modal emotion detection?	Predictive frustration modeling?
Quality Assessment	Hierarchical evaluation	Domain-specific rubrics?	Custom quality standards?
Human Routing	XGBoost + Context Ensemble	Deep Learning ensembles?	Reinforcement learning?
Context Management	RAG + SQL hybrid	Advanced semantic search	Knowledge graph integration

Lessons Learned

Building VIA taught us several important lessons about developing AI systems:

1. Evaluation Framework to Quickly Iterate and Experiment

Just like with traditional ML, unless you can quickly iterate, experiment, and log results, it is difficult to improve the models and the system.

2. Speed is Relative

A couple of seconds of response time from a foundation LLM didn't seem that bad in certain situations, but in others, especially when compounded with multiple calls, it was not really practical. Frustration analysis, for example, needs to be super fast or it causes more frustration than what it addresses.

3. Optimizing Performance is Challenging

Balancing all the trade-offs including speed, accuracy, and cost is a unique challenge beyond software engineering, but it is also an opportunity to excel above the competition.

                    🌟
                    Looking Ahead
                

I plan to continue development of this project as both a portfolio project to demonstrate abilities as well as a learning platform as new concepts and technologies are implemented. I will write about these experiences here.

Discussion

More about certain decisions and my thought processes:

Depth

This was a really fun project in terms of Product Design and System Architecture, but because of the time constraints and because I was the only one handling much of the project, there was just not much depth in terms of AI engineering. By the time our team had finally coalesced around a product idea, it became a mad dash to try to finish a prototype in less than four weeks.

That is one reason why I plan to continue development on this project - so that I can get a better understanding of (and practice with) key AI engineering concepts.

I think this does highlight one of my key strengths though - I assimilate and process information well (better than the vast majority of people in my highly-biased, personal opinion):

I can quickly see the "big picture (I came up with the design and architecture with no experience in LLMs)
AND I can quickly figure out all the finer details

Code Base

Because of these time issues, the code base tends to be a lot of scaffolding and placeholders. I wanted to create a solid foundation that could be built upon, but there is still a lot of work to do and a lot of cleanup.

I relied a lot on Claude Code - both for rapid code generation and because I was completely new to LLM and agentic programming. I did not have time to sort out all the issues that this might have introduced, so the code base is definitely still a "work in progress".

OOP (Object-Oriented Programming)

I have done procedural programming for a long time but just started really using OOP a few months ago and quickly became a big fan because I think it has some key strengths when using an AI coding assistant:

Modularity and Encapsulation - AIs tend to be over-eager and over-reaching, and discrete, modular classes and methods can help control this behavior by keeping everything in smaller, independent chunks.
Abstract Interfaces - I like to create a mirror directory structure of just all the abstract interfaces - the AI can use this for context without having to absorb the entire code base with each prompt - a blueprint instead of a building.
Building Blocks and Consistent Interfaces - Most programs need a few core modules like a Config Manager, Error Handler, and a Logger, and OOP makes it easier for AIs to just duplicate key modules from existing code bases.
Dependency Injection, DI Containers, and Factories - DI containers and factory methods make it easy to abstract out and swap components without having to touch the primary code. The AI can keep adding capabilities by just adding different concrete implementations without risking the key code.

(I'm a complete OOP novice, so let me know if I'm misstating or not understanding something here.)

Agent Framework

I chose LangChain and LangGraph for our LLM and agent framework simply based on their popularity and the extensive documentation and examples available for LangChain. Nobody on our team had LLM or agent programming experience, and it was essentially a coin toss between this and CrewAI.

For rapid development in a an unfamiliar technology, I would be relying on Claude Code a lot. AI coding assistants do better with more popular technologies.

Models for Prototype

For our prototype with its time constraints, we chose to use a foundation model API to do all the heavy lifting since it would be easy to implement without too much tuning and deployment issues. Some of the factors we looked at in choosing a particular model included:

Speed: Fast inference times crucial for user experience
Cost Efficiency: Allows for rapid experimentation without budget constraints
Versatility: Able to handle the various roles and tasks given to it
Reliability: Avoid failures during testing and live demo runs

After some limited testing of other "fast, cheap models" like Sonnet Haiku, we settled on Gemini Flash 2.5 as our primary language model primarily because of speed.

Source Code

The complete source code for VIA is available on GitHub. As I mentioned above, this is still very much a "work in progess" that will likely undergo significant changes.

I did a lot of experimentation, and the code probably still has some artifacts of this that were abandoned or became obsolete.

Repository Access

https://github.com/cmunch1/human-ai-orchestrator

Project Structure

The codebase follows a scalable multi-agent design with clear separation of concerns:

📁 Directory Tree

config/                     # Configuration Management

├── agents/                 # Agent-specific configurations

├── environments/           # Environment settings (dev/test/prod)

├── shared/                 # Shared configuration files

└── config.yaml             # Main configuration file

src/

├── core/                   # Infrastructure & Configuration

│   ├── agent_config_manager.py

│   ├── context_manager.py

│   ├── database_config.py

│   └── logging/

├── interfaces/             # Contract-driven development

│   ├── core/

│   ├── nodes/

│   └── workflows/

├── nodes/                  # AI Agent Implementations

│   ├── chatbot_agent.py

│   ├── quality_agent.py

│   ├── frustration_agent.py

│   ├── human_routing_agent.py

│   └── context_manager_agent.py

├── simulation/             # Testing & Validation Framework

│   ├── human_customer_simulator.py

│   ├── employee_simulator.py

│   └── demo_orchestrator.py

└── workflows/              # Orchestration & State Management

    └── hybrid_workflow.py

🏗️

Core Infrastructure

Configuration management, database connections, and logging utilities

🤖

Agent Nodes

Individual AI agents for frustration detection, quality assessment, and human routing

🧪

Simulation Framework

Testing environment with customer and employee simulators

The repository includes comprehensive documentation, setup instructions, and examples to help you understand and extend the VIA system (though I would wait until I got more completed before trying to extend it.)