Africa Next Voices

Sesotho speech dataset

Part of Swivuriso (ZA-African Next Voices), a large-scale multilingual speech dataset for South African languages. This configuration contains high-quality, first-language Sesotho speech: over 500 hours of scripted and unscripted audio, collected through ethical community-centred processes. Designed for ASR and inclusive speech technologies. Available free on Hugging Face under CC BY 4.0.

Key details

Hours available
503.6 hours
Speakers
480
Access
Available on Hugging Face
Audio format
WAV (48kHz mono)
Accents
South African Sesotho
Get dataset on Hugging Face

Dataset details

Hours available

503.6 hours

Age range

18 - 60+

Download size

Available on Hugging Face

Number of speakers

480

Audio format

WAV (48kHz mono)

Accents

South African Sesotho

Additional information

How are dataset recordings structured?

Our off-the-shelf dataset collections comprise unscripted, natural conversations conducted by call recorders recruited, trained, and approved to simulate real-world conversations in common domains. Recordings and transcripts include routine security verifications such as ID, email, and phone number validation.

How do you recruit for speech collection datasets?

Our priority is to create datasets that are unbiased and cover as wide a range of demographics as possible. That is the first consideration when we begin the planning and recruitment process of any speech collection dataset project.

What kind of agreement is in place for the purchase of this dataset?

A Licence Agreement governs the sale and usage of this speech collection dataset. Our off-the-shelf options are available for clients to test and benchmark before larger, custom commitments can be considered that are better suited to client requirements and conventions.

More languages & resources

Swivuriso includes all 7 South African languages. On Hugging Face you can load by language (e.g. zul, xho, sot). Use restrictions apply: not for TTS, voice cloning, or voice synthesis.