Case Study · EdTech

Unlocking a Scalable Product at an EdTech Company

How Aegion helped an educational technology company turn an AI experiment that wasn't working into a production-grade content engine — unlocking a new product line and a viable path to scale.

Background

The client is an educational technology company building personalized test preparation for competitive examinations across Asia. Their core market is India, where exams like the UPSC civil services examination attract over 1.5 million candidates annually for fewer than 1,000 positions. Students preparing for these exams need large volumes of practice questions that test complex reasoning, internal causality, and the ability to evaluate nuanced claims across policy, economics, law, and governance.

The company's vision was to make that level of preparation accessible and affordable. Manually, a single high-quality question could take a subject matter expert an hour to craft at approximately $5 per question. At that rate, personalized preparation was not a viable product.

The company had invested in AI-powered question generation. But their approach — a single complex prompt covering over a dozen question formats — produced output that was wrong roughly one in four times. Worse, the failure rate scaled with difficulty: the harder the question, the more likely the AI was to get it wrong. The company's subject matter experts lost trust in the pipeline and reverted to writing questions themselves. The AI investment was effectively abandoned.

Assessment

Aegion assessed the client's content operations, generation infrastructure, and quality workflows. We mapped the full lifecycle of a question from topic selection through generation, review, and publication. We studied the examinations themselves — what distinguishes a well-constructed question from a poor one, where domain expertise matters most, and which question structures expose AI weaknesses.

Two findings shaped our approach. First, the single-prompt architecture was a core contributor to poor performance. One prompt was handling multiple question formats, difficulty levels, and domain requirements simultaneously — complex enough to confuse the model, generic enough to produce mediocre output across every format. Second, the generation problem and the trust problem were linked. Even a meaningful improvement in AI quality would not matter if the experts did not believe it — and after months of poor output, their skepticism was earned.

The opportunity was to make the output trustworthy enough that experts would shift from writing questions to reviewing and refining them. If we could get there, the client would unlock a product that their current cost structure could not support.

Deliverables

Aegion's work reflected a core conviction: effective AI implementation requires engineering expertise beyond using LLM APIs out of the box. LLMs are probabilistic systems. Getting reliable output means designing processes and architectures that work with that probabilistic nature rather than against it.

We started with generation. We replaced the single complex prompt with a dedicated prompt for each question template and outcome pair — purpose-built, simplified, and easier for the model to execute well. This produced a meaningful improvement in output quality across every format, but the error rate remained too high for the client's experts to trust at scale.

So we went further. We developed a verification method we call ensemble verification, designed around the fact that LLMs are probabilistic: no single model can guarantee a correct answer, but the probability of multiple independent models making the same mistake on the same question is significantly lower. Every AI-generated question is independently answered by multiple large language models. Only questions where every model agrees on the correct answer pass through to the expert review queue. Any disagreement triggers rejection before a human ever sees it. Full consensus filters out the questions most likely to contain errors — and disproportionately catches the higher-order reasoning failures that matter most.

Business Impact

Aegion reduced the question generation error rate from roughly 27% to under 5% and brought the effective cost per student-ready question down to less than $0.50 — a greater than 90% reduction from the $5 per question the client was spending on expert-written content. Subject matter experts now review a curated queue of pre-verified content, editing and refining rather than generating from scratch.

The client launched a product that previously could not exist. Affordable, personalized preparation at examination-grade quality is now in production, and the company is expanding into additional examination types and geographies with infrastructure that scales with demand.

The gap between a promising AI experiment and a production-grade product is usually a process and engineering problem. Aegion brought the expertise to close it.

Error rate 27% → <5%, cost down >90%

Unlocking a Scalable Product at an EdTech Company

Background

Assessment

Deliverables

Business Impact

See yourself in one of these industries?