GYD AI: Engineering an AI Study Companion That Stays in Sync with the Classroom | Tech Blog

The Problem

Classroom teaching scales well. A single teacher can reach 30–40 students, but post-class learning does not scale. Students revisit notes to practice, clarify gaps, or test themselves, yet most ed-tech tools either provide generic question banks disconnected from lectures or chatbots that hallucinate beyond what was taught.

We wanted to build something different, an AI system where the teacher's voice remains the guiding force, even when the teacher isn't present. That meant designing around four constraints from day one:

Stay aligned with the classroom: All AI outputs questions, summaries, explanations,must mirror the teacher's terminology, examples, and logic. No conflicting methods or alternative interpretations.
Provide clarity when students get stuck: Detect when a student hits a learning barrier during independent study and deliver targeted, contextual help,not generic hints.
Practice with precision, not volume: More questions aren't better; the right questions, rooted in what was taught, are. Prioritize precision over scale.
Adapt as the student evolves: Student needs change over time; the system should learn from interaction patterns and adjust support dynamically, without manual tuning.

What GYD AI Does

Under the hood, GYD AI operates as a swarm of specialized AI agents, each managing a specific post-class learning facet. These agents generate practice cards from lecture PDFs, transcribe and summarize video recordings, solve math problems step-by-step, generate quizzes, search for educational videos, and conduct multi-turn conversations to help students grasp difficult concepts.

All agents share a common foundation: a FastAPI backend, MongoDB for persistence, Kafka for event streaming, and Google Gemini as the LLM layer using different model tiers for varying task complexities. Despite this stack, the core binding design constraint is that every agent strictly treats the teacher's uploaded classroom material as the single source of truth.

This architecture establishes a continuous feedback loop rather than a dead end for the post-class phase. Classroom materials drive AI generation, student interactions with that content produce engagement signals, and these signals subsequently shape future generation and recommendations. This post focuses on two specific core capabilities: practice card generation and serving, and the conversational assistant.

Practice Cards: From Lecture to Learning

Most ed-tech practice systems work from a static question bank. A content team writes thousands of questions, tags them by topic, and the system picks a random subset. This approach has a fundamental problem: the questions have no relationship to what a specific teacher covered in a specific class.

GYD AI takes the opposite approach: precision over scale. Every practice card is generated from the teacher's own uploaded material , a lecture PDF, a class recording transcript, or a study guide. The cards are not pre-authored. They are synthesized on-demand by an LLM that has the source material in its context window. Rather than drowning a student in an infinite feed of generic problems, the system delivers practice that is surgically tied to their recent classroom experience.

Fifteen Card Formats, Not One

One thing we learned early: a single question format doesn't hold attention. A feed of nothing but MCQs gets monotonous fast. We built support for 15 distinct card types, 12 of which are submissible (the student submits an answer that gets graded):

Interactive Cards

MCQ (multiple choice)
Fill-in-the-blanks
Match-the-pairs
Sort-in-order
True / False

Odd One Out
Integer answer
Hangman
Wordle
Anagram

Word Spelling (with audio)
Crossword
Info cards
Q&A pairs
Video highlights

MCQs, Reimagined

MCQ card example showing question with multiple choice options

Matching Made Easy

Matching card example showing neuron types paired with functions

Flashcards

Flashcard example with front and back content

Each card type has a JSON schema the LLM must conform to. For example, crossword cards carry grid dimensions, word placements with coordinates, and clues, while word spelling cards include generated audio, a definition, and phonetic distractors. This schema enforcement is critical: we do not ask the model to "generate a quiz" but to produce a structured JSON object matching a strict contract, and if it does not parse, it does not ship.

The Generation Pipeline

When a teacher uploads a PDF or a class recording gets transcribed, the generation pipeline kicks in:

Decision Check: Before generating anything, we run the source material through an LLM with a simple question: does this content warrant practice cards? A blurry photo of a whiteboard attendance sheet shouldn't trigger a generation job. This step also auto-detects the content language, which determines which card types are eligible.

Context Caching: We pre-create context caches for the source material. Since we make multiple generation calls (one per card type family), caching the input context avoids redundant token processing and reduces cost.

Initial Set: We generate a small initial batch of cards and make them immediately available. Students shouldn't have to wait for a full generation run to start practicing. The remaining card types are generated asynchronously in the background , this lets us keep the upfront cost low and scale generation based on actual usage.

Streaming Generation: As the LLM produces each card as a JSON object, we parse it incrementally and write it to the database. We don't wait for the full response to complete , cards become available the moment they're parsed and validated.

Embedding Generation: Every card gets a vector embedding. These embeddings power the recommendation and serving algorithms described below.

Model Routing

Not all card types need the same level of reasoning. A crossword puzzle with intersecting words and spatial constraints is harder to generate correctly than a true/false statement. We route card types to different Gemini model tiers accordingly: higher-reasoning tasks go to the more capable (and more expensive) model, while simpler card types go to a lighter, faster variant. This split lets us manage cost and latency without compromising output quality where it matters.

Serving: How We Decide What to Show

Generating good cards is only half the problem. Deciding which cards to show which student when is the other half.

We run multiple serving algorithms in parallel via A/B testing, with users deterministically assigned to a variant based on a hash of their user ID. Two examples illustrate the range:

One algorithm family optimizes for content diversity. It groups candidate cards by source (so cards from different lectures are interleaved) and samples within each group. It's simple, fast, and ensures students see a mix of material rather than a run of cards from a single lecture.

Another family optimizes for personalized relevance. It uses the student's learned preference vector to rank candidates, then applies K-Means clustering on the relevancy scores. Instead of picking the top-scoring cluster deterministically, it uses softmax with a temperature parameter to balance exploration and exploitation , starting broad and gradually narrowing toward the student's demonstrated interests. If a candidate card's cosine similarity to recently viewed cards exceeds a threshold, it gets rejected. This prevents the recommendation loop from getting stuck serving variations of the same concept.

Both algorithm families share a common constraint: no student ever sees the same card twice.

Never Showing a Card Twice

We guarantee per-user deduplication using a persistent Bloom filter for every user. When the serving algorithm fetches candidate cards, every candidate is checked against the user's Bloom filter before it enters the ranking pipeline. After a card is shown, it's added to the filter.

This is significantly more space-efficient than maintaining an explicit "viewed cards" set per user, which would grow unboundedly. The trade-off is a small false-positive rate , the occasional card might be incorrectly flagged as "already seen" , but that's vastly preferable to showing a duplicate.

A System That Learns What Works

The preference-based serving algorithm depends on a per-user preference vector , but where does that vector come from?

We maintain it through online learning. Every interaction a student has with a card produces a signal , likes, answer submissions, time spent viewing, quick dismissals. Each signal carries a different weight: a like is a strong positive indicator, a submitted answer is moderate, a quick dismiss is a weak negative.

These signals update the user's preference vector using a decaying learning rate. The update rule has a deliberate asymmetry: for positive signals, the system learns more unfamiliar content gets recommended; for negative signals, it intensifies familiar content more heavily. This prevents the profile from collapsing into a narrow niche and keeps recommendation diverse even as preferences sharpen.

The result is a system that adapts over time. A student's practice feed in week one , when the system knows little about them , looks very different from week eight, when it has accumulated hundreds of interaction signals. The AI's support matures alongside the student, shifting in focus and complexity without anyone having to manually adjust it.

The Conversational Assistant

Beyond structured practice, GYD AI includes a conversational agent , a streaming chatbot that helps students with doubts, quiz generation, homework tracking, and more. This is where the "people clarity" goal comes to life.

The assistant is context-aware. It knows the student's name, their classes, their subjects, and , critically , it has access to the same classroom materials that the practice engine uses for card generation.

Two-Pass LLM Architecture

A naive chatbot makes one LLM call per user message. GYD AI's conversational agent uses two.

First pass , Routing. The user's message is sent to the model along with function declarations for all available tools. The model decides: should it answer directly, or does it need to call a tool first? If the student asks "show me my pending homework," the model invokes a platform resource tool. If they ask "explain photosynthesis," it responds directly.

Second pass , Response generation. If a tool was invoked in the first pass, its result is fed back into the model as context, and a second call generates the final natural-language response. This keeps tool output structured and the student-facing response conversational.

Some tools bypass the second pass entirely. Quiz generation and video search, for example, produce structured output that is the response , wrapping them in additional narration would add latency without value.

Tool-Augmented Generation

The conversational agent can invoke tools through Gemini's native function calling , no keyword matching or intent classifiers. The LLM itself routes based on the full context.

The tools include quiz generation, educational video search, platform resource fetching (homework, study materials, recordings, tests, attendance), image search, diagram generation, PDF export, math expression formatting, and web search as a fallback. Each tool produces structured output that the frontend renders in its own format , quiz cards, video carousels, resource lists, diagrams, and so on.

The Quiz Lifecycle

Quizzes have a full lifecycle, not just a "generate and forget" pattern:

Generation , The student asks for a quiz (e.g., "quiz me on chapter 3"). The model invokes the quiz tool, which generates structured questions with options and correct answers.
Dedicated session , A separate chat session is created for the quiz, so the student can discuss questions without polluting their main conversation history.
Submission , The student submits answers. A separate LLM call reviews the attempt and produces structured feedback: specific skills to focus on and specific strengths demonstrated.

Quiz lifecycle showing generation, session, submission, and review stages

Review , The feedback is stored and retrievable, creating a record the student (or teacher) can revisit.

Real-Time Streaming

Responses stream in real-time rather than waiting for the full response to assemble. The student sees the answer forming word by word, with status notifications as processing progresses , acknowledging the query, indicating tool use, and then streaming the actual content. This makes the experience feel responsive even when the backend is doing multi-step processing.

Personalization

The system instruction is built dynamically at runtime by injecting the student's class list, subjects, timetable, and personalized topic suggestions drawn from their own classroom content. This means the assistant isn't starting from zero , it already knows the student's academic context before the first message.

The Teacher's Voice as the Guiding Force

All of GYD AI's agents share the same grounding principle: prefer the teacher's material over general knowledge.

In the practice engine, this is structural. The generation pipeline has the uploaded PDF in its context window. There's no retrieval step that might pull in irrelevant web content , the model sees only the source material.

In the conversational agent, grounding works through the prompt system and tool design. The system instruction explicitly directs the model to prioritize classroom context. Web search is a deliberate fallback, not a default.

All agents also enforce a decision check before AI generation. If the system cannot find sufficient source material to generate a high-confidence output, it acknowledges the gap rather than speculating. This is a deliberate design choice: a wrong answer from an AI study tool is worse than no answer, because it creates a broken mental model that the student then has to unlearn.

Safety

GYD AI is built for students , many of them minors. Safety isn't a bolt-on; it's embedded into the prompt layer and the generation pipeline.

The system instructions enforce strict behavioral boundaries. The conversational agent is explicitly constrained to educational topics and will not engage with requests that fall outside the academic context. Content generation is filtered through safety settings that block harmful, inappropriate, or age-unsuitable outputs at the model level.

Beyond content safety, the grounding architecture itself acts as a guardrail. Because the system prioritizes the teacher's uploaded material over open-ended web knowledge, the surface area for unexpected or inappropriate content is significantly reduced. The model isn't free-associating from the entire internet , it's working from a bounded, teacher-curated context.

Multi-Language Support

GYD AI supports over 30 languages. Language detection happens automatically during the decision check step , the model identifies the primary language of the source material, and this propagates through the entire generation and serving flow. Card types that require specific language input are automatically adjusted based on the detected script, and audio pronunciation is generated via text-to-speech with language-specific routing for accuracy across different locales.

Gamification: Keeping Students Coming Back

Practice only works if students actually do it. We built a gamification layer to drive consistent engagement:

Streaks - Consecutive days with at least one submission. Both current and all-time longest streaks are tracked.
Tiers - Students progress through tiers based on cumulative correct solves.
Leaderboards - Points are awarded for engagement (logins, correct and incorrect submissions - we reward attempt, not just accuracy). Points are multiplied during boost sessions.
Boosts - Timed practice sessions (daily boosts, daily practice, or user-initiated) that create focused, time-boxed practice experiences.

Weekly leaderboard showing Bronze League with student ranks and XP

What This Means

The engineering behind GYD AI translates into a few concrete outcomes:

Students get structured support around the clock: The gap between the classroom and independent study , where learning has traditionally been fragmented and unguided , is bridged by AI that extends the teacher's instruction into a 24/7 study companion.

Practice becomes high-signal, not high-volume: Vector similarity and preference learning ensure that practice is relevant and non-repetitive. Students spend time on their actual conceptual gaps, not on generic exercises that may not relate to their coursework.

Teachers scale without extra work: The system serves as a force-multiplier for the teacher's instructional intent. Every student gets individualized practice and support derived from the teacher's own material , without the teacher creating a single additional worksheet or study guide.

Learning becomes a continuous loop: Isolated study sessions transform into a data-driven feedback cycle: classroom content feeds AI generation, student interaction shapes future recommendations, and the system's support evolves as the student progresses from foundational understanding to advanced application.

What We Learned

Building GYD AI reinforced a few engineering convictions:

Schema enforcement over free-form generation: Asking an LLM to "generate flashcards" produces inconsistent, unstructured output. Asking it to produce a JSON object conforming to a strict schema , and rejecting anything that doesn't parse , produces reliable, machine-readable results at the cost of slightly more complex prompting.

A/B test your recommendation logic early: We run multiple serving algorithms in parallel , some optimize for diversity, others for personalized relevance. Running them under A/B testing lets us measure what actually drives engagement rather than guessing.

Bloom filters are underappreciated: The "never show the same card twice" guarantee sounds trivial, but implementing it naively doesn't scale. Bloom filters give us a probabilistic guarantee with fixed memory overhead, and the false-positive rate is a non-issue in this context.

Separate routing from response: The two-pass architecture in our conversational agent is more expensive per message than a single LLM call. But it's dramatically more reliable for tool use. The model can focus on deciding in the first pass without being distracted by composing, and vice versa.