---
title: "African Speech Datasets Catalog | Way With Words"
description: "Browse African speech datasets for ASR and voice AI—paid 50-hour collections and free open releases. Compare languages, hours, and licensing, then open each dataset page for specs and samples."
image: "https://waywithwords.ai/og-default.png"
---

Off-the-shelf datasets for sale

# African speech datasets for sale and AI training

Fifty-hour conversational speech collections, ready for benchmarking and model training. Each dataset is planned, collected, and annotated with NLP best practice in mind.

Need something different? [Talk to us about custom datasets](/contact).

Low‑risk evaluation Enterprise licensing Production‑ready delivery Hard‑to‑source languages

![African Speech Data Made for AI](/images/pages/datasets/african-speech-datasets-made-for-ai.png)

## Browse by language

Each language has a dedicated page with hours, speaker demographics, audio samples, and download options. Need help choosing? [Talk to us](/contact) or read about our [community-centric data licensing model](/esethu).

[

English Paid

South Africa Insurance +2

50h 38GB

South African English conversational speech dataset built for ASR training, evaluation, and multilingual AI development, featuring real-world contact-centre style interactions and diverse regional accents.

Updated: August 2023 View more

](/datasets/english)[

seSotho Paid

South Africa InsuranceRetail +2

50h 38GB

Conversational seSotho speech data collected from first-language speakers, designed to improve representation of under-resourced African languages in speech recognition and language model training.

Updated: August 2023 View more

](/datasets/sesotho)[

isiZulu Paid

South Africa InsuranceRetail +2

50h 38GB

Production-ready isiZulu conversational speech dataset supporting ASR benchmarking and multilingual AI workflows, with tonal language coverage and realistic acoustic environments.

Updated: August 2023 View more

](/datasets/isizulu)[

Afrikaans Paid

South Africa InsuranceRetail +2

50h 38GB

Afrikaans conversational speech data designed for speech recognition, conversational AI, and evaluation use cases, reflecting natural language usage across multiple domains.

Updated: August 2023 View more

](/datasets/afrikaans)

Free & open

## Africa Next Voices (Swivuriso)

Large-scale multilingual speech dataset for 7 South African languages—over 3,000 hours in total. Built for ASR research and inclusive technologies. Available free on Hugging Face (CC BY 4.0). Way With Words produced the South African component with [DSFSI](https://www.dsfsi.co.za/za-african-next-voices/).

[

isiZulu Free

South Africa Scripted & unscriptedASRMultilingual

503h Hugging Face

Over 500 hours of isiZulu speech from the Swivuriso dataset—scripted and unscripted, first-language speakers—for ASR and inclusive speech technology.

Updated: November 2025 View more

](/datasets/anv-isizulu)[

isiXhosa Free

South Africa Scripted & unscriptedASRMultilingual

504h Hugging Face

Over 500 hours of isiXhosa speech from Swivuriso—scripted and unscripted, first-language speakers—for ASR and inclusive speech technology.

Updated: November 2025 View more

](/datasets/anv-isixhosa)[

Sesotho Free

South Africa Scripted & unscriptedASRMultilingual

504h Hugging Face

Over 500 hours of Sesotho speech from Swivuriso—scripted and unscripted, first-language speakers—for ASR and inclusive speech technology.

Updated: November 2025 View more

](/datasets/anv-sesotho)[

Setswana Free

South Africa Scripted & unscriptedASRMultilingual

502h Hugging Face

Over 500 hours of Setswana speech from Swivuriso—scripted and unscripted, first-language speakers—for ASR and inclusive speech technology.

Updated: November 2025 View more

](/datasets/anv-setswana)[

Xitsonga Free

South Africa Scripted & unscriptedASRMultilingual

500h Hugging Face

Over 500 hours of Xitsonga speech from Swivuriso—scripted and unscripted, first-language speakers—for ASR and inclusive speech technology.

Updated: November 2025 View more

](/datasets/anv-xitsonga)[

Tshivenda Free

South Africa Scripted & unscriptedASRMultilingual

251h Hugging Face

Over 250 hours of Tshivenda speech from Swivuriso—scripted and unscripted, first-language speakers—for ASR and inclusive speech technology.

Updated: November 2025 View more

](/datasets/anv-tshivenda)[

isiNdebele Free

South Africa Scripted & unscriptedASRMultilingual

252h Hugging Face

Over 250 hours of isiNdebele speech from Swivuriso—scripted and unscripted, first-language speakers—for ASR and inclusive speech technology.

Updated: November 2025 View more

](/datasets/anv-isindebele)

[Get the full dataset on Hugging Face](https://huggingface.co/datasets/dsfsi-anv/za-african-next-voices) — accept the use conditions to access. Not for TTS, voice cloning, or voice synthesis.

Need a custom collection or different languages?

Did you know we started out collecting UK, Australian, Irish and Scottish English data for major data providers?

We can do the same for you, in any language or domain. Just ask.

[Talk to us about custom datasets →](/contact)

```json
{"@context":"https://schema.org","@type":"Organization","name":"Way With Words AI","url":"https://waywithwords.ai","email":"hello@waywithwords.ai","contactPoint":[{"@type":"ContactPoint","contactType":"customer support","telephone":"+44 208 157 9929","email":"hello@waywithwords.ai","areaServed":"GB","availableLanguage":"en"},{"@type":"ContactPoint","contactType":"customer support","telephone":"+27 21 879 3552","email":"hello@waywithwords.ai","areaServed":"ZA","availableLanguage":"en"}],"location":[{"@type":"Place","name":"Way With Words Limited (UK Office)","address":{"@type":"PostalAddress","streetAddress":"Caledonian House Business Centre, 164 High Street","addressLocality":"Elgin","postalCode":"IV30 1BD","addressCountry":"GB"}},{"@type":"Place","name":"Way With Words SA (Pty) Ltd (South Africa & SADC Office)","address":{"@type":"PostalAddress","streetAddress":"First Floor, Vineyards Square North, The Vineyards Office Estate, 99 Jip de Jager Drive, Bellville","addressLocality":"Cape Town","postalCode":"7530","addressCountry":"ZA"}}]}
{"@context":"https://schema.org","@type":"ItemList","name":"Way With Words Speech Datasets Catalog","url":"https://waywithwords.ai/datasets","itemListElement":[{"@type":"ListItem","position":1,"item":{"@type":"Dataset","name":"English Speech Dataset","description":"South African English conversational speech dataset built for ASR training, evaluation, and multilingual AI development, featuring real-world contact-centre style interactions and diverse regional accents.","url":"https://waywithwords.ai/datasets/english"}},{"@type":"ListItem","position":2,"item":{"@type":"Dataset","name":"seSotho Speech Dataset","description":"Conversational seSotho speech data collected from first-language speakers, designed to improve representation of under-resourced African languages in speech recognition and language model training.","url":"https://waywithwords.ai/datasets/sesotho"}},{"@type":"ListItem","position":3,"item":{"@type":"Dataset","name":"isiZulu Speech Dataset","description":"Production-ready isiZulu conversational speech dataset supporting ASR benchmarking and multilingual AI workflows, with tonal language coverage and realistic acoustic environments.","url":"https://waywithwords.ai/datasets/isizulu"}},{"@type":"ListItem","position":4,"item":{"@type":"Dataset","name":"Afrikaans Speech Dataset","description":"Afrikaans conversational speech data designed for speech recognition, conversational AI, and evaluation use cases, reflecting natural language usage across multiple domains.","url":"https://waywithwords.ai/datasets/afrikaans"}},{"@type":"ListItem","position":5,"item":{"@type":"Dataset","name":"isiZulu Speech Dataset","description":"Over 500 hours of isiZulu speech from the Swivuriso dataset—scripted and unscripted, first-language speakers—for ASR and inclusive speech technology.","url":"https://waywithwords.ai/datasets/anv-isizulu"}},{"@type":"ListItem","position":6,"item":{"@type":"Dataset","name":"isiXhosa Speech Dataset","description":"Over 500 hours of isiXhosa speech from Swivuriso—scripted and unscripted, first-language speakers—for ASR and inclusive speech technology.","url":"https://waywithwords.ai/datasets/anv-isixhosa"}},{"@type":"ListItem","position":7,"item":{"@type":"Dataset","name":"Sesotho Speech Dataset","description":"Over 500 hours of Sesotho speech from Swivuriso—scripted and unscripted, first-language speakers—for ASR and inclusive speech technology.","url":"https://waywithwords.ai/datasets/anv-sesotho"}},{"@type":"ListItem","position":8,"item":{"@type":"Dataset","name":"Setswana Speech Dataset","description":"Over 500 hours of Setswana speech from Swivuriso—scripted and unscripted, first-language speakers—for ASR and inclusive speech technology.","url":"https://waywithwords.ai/datasets/anv-setswana"}},{"@type":"ListItem","position":9,"item":{"@type":"Dataset","name":"Xitsonga Speech Dataset","description":"Over 500 hours of Xitsonga speech from Swivuriso—scripted and unscripted, first-language speakers—for ASR and inclusive speech technology.","url":"https://waywithwords.ai/datasets/anv-xitsonga"}},{"@type":"ListItem","position":10,"item":{"@type":"Dataset","name":"Tshivenda Speech Dataset","description":"Over 250 hours of Tshivenda speech from Swivuriso—scripted and unscripted, first-language speakers—for ASR and inclusive speech technology.","url":"https://waywithwords.ai/datasets/anv-tshivenda"}},{"@type":"ListItem","position":11,"item":{"@type":"Dataset","name":"isiNdebele Speech Dataset","description":"Over 250 hours of isiNdebele speech from Swivuriso—scripted and unscripted, first-language speakers—for ASR and inclusive speech technology.","url":"https://waywithwords.ai/datasets/anv-isindebele"}},{"@type":"ListItem","position":12,"item":{"@type":"Dataset","name":"Hausa Speech Dataset","description":"Hausa speech with transcriptions from the African Voices multilingual dataset—download and versioning via africanvoices.io.","url":"https://waywithwords.ai/datasets/anv-av-hausa"}},{"@type":"ListItem","position":13,"item":{"@type":"Dataset","name":"Igbo Speech Dataset","description":"Igbo speech with transcriptions from the African Voices multilingual dataset—download and versioning via africanvoices.io.","url":"https://waywithwords.ai/datasets/anv-av-igbo"}},{"@type":"ListItem","position":14,"item":{"@type":"Dataset","name":"Yoruba Speech Dataset","description":"Yoruba speech with transcriptions from the African Voices multilingual dataset—download and versioning via africanvoices.io.","url":"https://waywithwords.ai/datasets/anv-av-yoruba"}},{"@type":"ListItem","position":15,"item":{"@type":"Dataset","name":"Dholuo Speech Dataset","description":"Dholuo speech from the Kenyan African Next Voices collection—scripted and unscripted domains on Hugging Face (work in progress; check org for latest hours).","url":"https://waywithwords.ai/datasets/anv-ke-dholuo"}},{"@type":"ListItem","position":16,"item":{"@type":"Dataset","name":"Kikuyu Speech Dataset","description":"Kikuyu speech from the Kenyan African Next Voices collection—multiple dialects on Hugging Face (work in progress).","url":"https://waywithwords.ai/datasets/anv-ke-kikuyu"}},{"@type":"ListItem","position":17,"item":{"@type":"Dataset","name":"Somali Speech Dataset","description":"Somali speech from the Kenyan African Next Voices collection (Maxatire)—HF dataset Anv-ke/Somali; complements other Somali resources.","url":"https://waywithwords.ai/datasets/anv-ke-somali"}},{"@type":"ListItem","position":18,"item":{"@type":"Dataset","name":"Kalenjin Speech Dataset","description":"Kalenjin speech (Nandi & Kipsigis) from the Kenyan African Next Voices collection on Hugging Face.","url":"https://waywithwords.ai/datasets/anv-ke-kalenjin"}},{"@type":"ListItem","position":19,"item":{"@type":"Dataset","name":"Maasai Speech Dataset","description":"Maasai speech (Kimasaai & Kisamburu) from the Kenyan African Next Voices collection on Hugging Face.","url":"https://waywithwords.ai/datasets/anv-ke-maasai"}}]}
```