African speech datasets for AI
Fifty-hour conversational speech collections, ready for benchmarking and model training. Each dataset is planned, collected, and annotated with NLP best practice in mind.
Need something different? Talk to us about custom datasets.
South African English conversational speech dataset built for ASR training, evaluation, and multilingual AI development, featuring real-world contact-centre style interactions and diverse regional accents.
Conversational seSotho speech data collected from first-language speakers, designed to improve representation of under-resourced African languages in speech recognition and language model training.
Production-ready isiZulu conversational speech dataset supporting ASR benchmarking and multilingual AI workflows, with tonal language coverage and realistic acoustic environments.
Afrikaans conversational speech data designed for speech recognition, conversational AI, and evaluation use cases, reflecting natural language usage across multiple domains.
Africa Next Voices (Swivuriso)
Large-scale multilingual speech dataset for 7 South African languages—over 3,000 hours in total. Built for ASR research and inclusive technologies. Available free on Hugging Face (CC BY 4.0). Way With Words produced the South African component with DSFSI.
Over 500 hours of isiZulu speech from the Swivuriso dataset—scripted and unscripted, first-language speakers—for ASR and inclusive speech technology.
Over 500 hours of isiXhosa speech from Swivuriso—scripted and unscripted, first-language speakers—for ASR and inclusive speech technology.
Over 500 hours of Sesotho speech from Swivuriso—scripted and unscripted, first-language speakers—for ASR and inclusive speech technology.
Over 500 hours of Setswana speech from Swivuriso—scripted and unscripted, first-language speakers—for ASR and inclusive speech technology.
Over 500 hours of Xitsonga speech from Swivuriso—scripted and unscripted, first-language speakers—for ASR and inclusive speech technology.
Over 250 hours of Tshivenda speech from Swivuriso—scripted and unscripted, first-language speakers—for ASR and inclusive speech technology.
Over 250 hours of isiNdebele speech from Swivuriso—scripted and unscripted, first-language speakers—for ASR and inclusive speech technology.
Get the full dataset on Hugging Face — accept the use conditions to access. Not for TTS, voice cloning, or voice synthesis.
Need a custom collection or different languages?
Did you know we started out collecting UK, Australian, Irish and Scottish English data for major data providers?
We can do the same for you, in any language or domain. Just ask.
Talk to us about custom datasets