← Blog 28 February 2025

Building African Next Voices: Our Journey

How we helped deliver the South African component of the Africa Next Voices initiative alongside DSFSI, building TalkTag and producing large-scale ethical speech datasets.

For more than two decades, we have worked at the intersection of language and technology. Since 2002, our roots have been in human transcription, helping organisations capture spoken language with accuracy and care. Over time, however, we saw a shift coming. Speech technology was evolving quickly, and the need for high-quality training data — especially for underrepresented languages — was becoming urgent.

Our journey into large-scale speech data began with smaller, custom collection projects across different English dialects. These early projects taught us valuable lessons about recruitment, recording workflows, and data management. In 2021, we took a major step forward by producing our own conversational datasets for South African languages, recording 50 hours each of English, Afrikaans, isiZulu, and Sesotho. Those projects strengthened our relationships within the local speech-AI ecosystem and laid the groundwork for something far bigger.

That opportunity arrived in the form of Africa Next Voices.

What Africa Next Voices Set Out to Achieve

Africa Next Voices (ANV) is a large-scale initiative supported by the Gates Foundation and a network of research and technology partners working to expand high-quality speech datasets for African languages. The project spans multiple countries and organisations, each contributing expertise in linguistics, technology, and community-driven data collection.

In South Africa, the project is coordinated by the Data Science for Social Impact (DSFSI) group at the University of Pretoria. Our role at Way With Words is to act as the data production and workflow partner — designing and running the large-scale recording, transcription, proofing, and quality control processes needed to deliver the South African dataset.

The South African component alone aimed to produce approximately 3,000 hours of speech across seven languages. The scale of the project reflected a growing recognition that AI systems cannot become truly inclusive without strong representation of African voices, accents, and linguistic diversity.

During the project, Dr Vukosi Marivate described the work as a watershed moment for African language technology — a shift toward building datasets that reflect how people actually speak, rather than relying on scraped or artificial sources. That vision aligned closely with our own approach to ethical, community-driven data.

From Transcription Company to Data Builder

When we joined Africa Next Voices alongside DSFSI and the broader ANV consortium, we knew the scale would be unlike anything we had done before. The South African component required the coordination of thousands of contributors, complex recording workflows, and strict quality standards — all delivered within an 18-month timeline.

Our traditional, manual workflows were not designed for this level of complexity. Managing recordings, transcripts, proofing, and quality control across such a large distributed team required a complete rethink of how we operated.

So we built TalkTag.

Over eight months, our technical team developed TalkTag — a spoke-based workflow designed specifically for large-scale speech data production. TalkTag allowed us to manage recording, proofing, quality control, contractor coordination, and dataset packaging in a single unified environment. The platform also introduced payment tracking, identity management for accounting, and real-time monitoring of progress — essential features when working with more than 3,000 contributors.

Building at Scale — With People at the Centre

The success of Africa Next Voices depended on people. We received more than 63,000 applications from individuals who wanted to participate, but the project required a carefully selected and trained group of approximately 2,500 speakers and language specialists.

One of our biggest challenges was moving beyond professional transcription workflows. Many contributors had never transcribed before, and in some cases we were working in languages our core team did not speak. To bridge this gap, we recruited first-language experts for each language and invested heavily in training new proofing teams from scratch. Together, we developed best practices as the project evolved, balancing speed with accuracy.

Working closely with DSFSI and language teams across the country also deepened our understanding of South Africa’s linguistic and cultural diversity. Language data is never just technical — it reflects lived experience, identity, and community.

Scripted and Unscripted Speech — Designing Real Conversations

A key part of the project involved capturing both scripted and unscripted speech. The unscripted recordings allowed contributors to speak naturally, helping us collect authentic conversational language that reflects how people communicate in everyday life.

For scripted recordings, we took a hybrid approach. In some cases, suitable written material simply did not exist, which meant working closely with language experts to design custom prompts from scratch. Where possible, we also sourced material from agricultural publications to create prompts grounded in real-world contexts. This ensured that the dataset included vocabulary and scenarios relevant to practical AI use cases.

Balancing scripted and unscripted recording styles allowed us to build a dataset that felt natural while still maintaining consistency for model training.

Overcoming Technical and Logistical Challenges

Large-scale speech collection in South Africa presented unique hurdles. Internet connectivity varied widely, and contributors worked with a range of devices and recording environments. Rather than distributing equipment to all recorders, we focused on supporting Language Assistants and Language Leads with the tools they needed to manage coordination, quality, and training at scale.

Within TalkTag, we integrated advanced quality checks directly into the workflow. Signal-to-noise testing ensured recordings met acoustic standards, while first-draft transcripts provided a starting point for human proofers. These features helped us scale without compromising on quality.

Managing payments for thousands of contractors was another major undertaking. TalkTag enabled us to track work, verify contributions, and ensure that everyone involved was compensated fairly — a core part of our commitment to ethical data practices.

Redefining What Transcription Means in the Age of AI

Our background in human transcription shaped how we approached Africa Next Voices. As AI tools become more accurate, transcription itself is evolving from a specialised service into an everyday capability. What matters most now is the data that trains those systems.

Through this project, our focus shifted toward building datasets that enable accurate, inclusive language models for South African languages. Instead of simply transcribing speech, we became architects of training data — working alongside DSFSI and the broader ANV initiative to deliver ethically collected, community-driven datasets.

Contributors were not just data sources; they were collaborators in shaping the future of language technology.

Lessons from an 18-Month Journey

Africa Next Voices challenged us in ways we had never experienced before. We learned how to build technology while a project was already in motion, how to train large teams across multiple languages simultaneously, and how to balance speed with precision under tight deadlines.

We also saw firsthand how community-driven data collection differs from traditional approaches. Instead of relying on existing digital content, we worked directly with people — capturing speech that reflects real environments and real stories. That approach aligned strongly with the broader ANV vision and reinforced the importance of ethical, community-first data practices.

Looking Ahead

Completing Africa Next Voices marked a turning point for us. It confirmed that our future lies in creating high-quality training data that supports inclusive AI — particularly for languages that have historically been overlooked in technology.

As speech technology continues to evolve, we remain focused on building datasets that empower people to use AI in their own languages. The lessons we learned during Africa Next Voices — and the foundation built with TalkTag — will continue to shape how we approach future projects.

For us, Africa Next Voices was more than a project. It was a step toward a future where African languages are not just supported by technology, but actively shape it.