We're a small team building speech data for Africa
We design, collect, and deliver multilingual speech datasets so AI can serve African languages well—with integrity and scale.
Our story
Founded in 2002, Way With Words is a globally recognised audio-to-text services and solutions provider, specialising in transcription and captioning. Over the years, we've built a reputation for quality and reliability, working with clients across media, academia, and industry. Our experience in handling complex audio data and our commitment to 99%+ accurate data led us to the world of AI training data.
We started creating our own off-the-shelf datasets because we saw a gap: high-quality African speech data was hard to find, and what existed often didn't reflect the diversity and nuance of how people actually speak. We wanted to change that.
We believe inclusive AI starts with inclusive data. That means working with communities, respecting consent and privacy, and delivering datasets that are ready for real products, not just research.
We're a small team: linguists, engineers, and operations people who care about getting this right. When you work with us, you work with humans who'll answer your questions and tailor the pipeline to your needs.
What Makes Us Different
We focus on the part most data pipelines ignore: the people behind the data.
Most quality issues don’t come from systems. They come from how people interact with them — rushing, skipping checks, or approving without review. We design our workflows with that in mind.
A human approach to data
We design around how people actually behave, not how we expect them to behave.
That means anticipating where mistakes happen and building systems that catch those issues early.
The goal is not perfection — it’s consistency.
Quality is built into the process
Instead of relying on a final review step, we build quality checks throughout the pipeline.
This includes:
- Preventing blind approvals
- Flagging when work has not been properly reviewed
- Structuring tasks so that attention is required at each stage
Instead of fixing problems at the end, we prevent them from getting that far.
We care about how the data is created
We started in transcription, managing large distributed teams and working with sensitive audio at scale.
That background still shapes how we work:
- Clear processes
- Strong communication
- Respect for the people doing the work
- Planning for language nuances upfront so contributors know how to handle differences from day one
Who we work with
We work with teams who care about how their data is created, not just the final output.
That includes thinking about:
- Consent
- Fair treatment of contributors
- The broader impact of the data being collected
If the goal is speed at any cost, we’re probably not the right fit.
The result is data that holds up in real-world use — consistent, reliable, and built with a clear understanding of both the technical and human sides of the problem.
What we stand for
Three principles that guide how we work.
These are not marketing claims. They shape how projects are scoped, how contributors are supported, and how quality is managed from day one.
Ethical by design
Consent, transparency, and governance are built into every step—not bolted on.
Community-first
We work with first-language speakers and local partners so data reflects real voices.
Production-ready
Structured metadata, QA, and packaging so you can ship, not clean.
The outcome is simple: dependable datasets your team can use in production, created through a process that remains accountable to both contributors and end users.
How we build datasets
We run a structured pipeline from planning and collection through QA, validation, and packaging.
Every step is designed for compliance and scale, with clear ownership and checkpoints so quality does not depend on luck.
Phase 1
Scope and design
Define language targets, demographic requirements, domain goals, and governance controls before any collection starts.
Phase 2
Collect and validate
Run contributor onboarding, consent, recording, and multi-layer QA with checks embedded throughout the workflow.
Phase 3
Package and deliver
Finalize structured outputs, metadata, and documentation so teams can integrate datasets quickly into production pipelines.
You can see the full workflow on our homepage, and explore our datasets and frameworks for details on what we deliver.
Want to work with us?
Get in touch