About us

We're a small team building speech data for Africa

We design, collect, and deliver multilingual speech datasets so AI can serve African languages well—with integrity and scale.

Our story

Founded in 2002, Way With Words is a globally recognised audio-to-text services and solutions provider, specialising in transcription and captioning. Over the years, we've built a reputation for quality and reliability, working with clients across media, academia, and industry. Our experience in handling complex audio data and our commitment to 99%+ accurate data led us to the world of AI training data.

We started creating our own off-the-shelf datasets because we saw a gap: high-quality African speech data was hard to find, and what existed often didn't reflect the diversity and nuance of how people actually speak. We wanted to change that.

We believe inclusive AI starts with inclusive data. That means working with communities, respecting consent and privacy, and delivering datasets that are ready for real products, not just research.

We're a small team: linguists, engineers, and operations people who care about getting this right. When you work with us, you work with humans who'll answer your questions and tailor the pipeline to your needs.

What Makes Us Different

We focus on the part most data pipelines ignore: the people behind the data.

Most quality issues don’t come from systems. They come from how people interact with them — rushing, skipping checks, or approving without review. We design our workflows with that in mind.

A human approach to data

We design around how people actually behave, not how we expect them to behave.

That means anticipating where mistakes happen and building systems that catch those issues early.

The goal is not perfection — it’s consistency.

Quality is built into the process

Instead of relying on a final review step, we build quality checks throughout the pipeline.

This includes:

  • Preventing blind approvals
  • Flagging when work has not been properly reviewed
  • Structuring tasks so that attention is required at each stage

Instead of fixing problems at the end, we prevent them from getting that far.

We care about how the data is created

We started in transcription, managing large distributed teams and working with sensitive audio at scale.

That background still shapes how we work:

  • Clear processes
  • Strong communication
  • Respect for the people doing the work
  • Planning for language nuances upfront so contributors know how to handle differences from day one

Who we work with

We work with teams who care about how their data is created, not just the final output.

That includes thinking about:

  • Consent
  • Fair treatment of contributors
  • The broader impact of the data being collected

If the goal is speed at any cost, we’re probably not the right fit.

The result is data that holds up in real-world use — consistent, reliable, and built with a clear understanding of both the technical and human sides of the problem.

What we stand for

Three principles that guide how we work.

These are not marketing claims. They shape how projects are scoped, how contributors are supported, and how quality is managed from day one.

Ethical by design

Consent, transparency, and governance are built into every step—not bolted on.

Community-first

We work with first-language speakers and local partners so data reflects real voices.

Production-ready

Structured metadata, QA, and packaging so you can ship, not clean.

The outcome is simple: dependable datasets your team can use in production, created through a process that remains accountable to both contributors and end users.

How we build datasets

We run a structured pipeline from planning and collection through QA, validation, and packaging.

Every step is designed for compliance and scale, with clear ownership and checkpoints so quality does not depend on luck.

Phase 1

Scope and design

Define language targets, demographic requirements, domain goals, and governance controls before any collection starts.

Phase 2

Collect and validate

Run contributor onboarding, consent, recording, and multi-layer QA with checks embedded throughout the workflow.

Phase 3

Package and deliver

Finalize structured outputs, metadata, and documentation so teams can integrate datasets quickly into production pipelines.

You can see the full workflow on our homepage, and explore our datasets and frameworks for details on what we deliver.

Meet the team

A small group of people who care about language, data, and getting it right.

Graham Morrissey

Graham Morrissey

Operations

Dale Dunbar

Dale Dunbar

Linguistics

Francois Smit

Francois Smit

Technology

Want to work with us?

Get in touch