Africa Next Voices

Somali speech dataset

African Next Voices: Data collection in Kenya (KenCorpus Consortium, Gates Foundation). Scripted and unscripted speech across multiple domains, collected through ethical, community-led processes. CC BY 4.0. See the Hugging Face organization for the latest splits and attribution. This configuration covers Somali as collected in Kenya (distinct from other ANV geography tracks). Check the dataset card for hours, transcription coverage, and license.

Looking for more options? Browse the full African speech datasets catalog or see our community-centric data licensing framework.

Key details

Hours available
502
Speakers
0
Access
Hugging Face
Audio format
WAV (per dataset card)
Accents
Kenyan Somali (Maxatire)
Get dataset on Hugging Face

Dataset details

Hours available

502

Age range

18 - 60+

Download size

Hugging Face

Number of speakers

0

Audio format

WAV (per dataset card)

Accents

Kenyan Somali (Maxatire)

Additional information

African Next Voices — Kenya

This listing points to African Next Voices in Kenya (KenCorpus Consortium, Gates Foundation): scripted and unscripted speech collected through community-led processes, with per-language dataset repos under the Anv-ke organization on Hugging Face. The public cards describe domains, splits, transcription coverage, and ethical use; treat releases as work in progress and follow CC BY 4.0 attribution on the dataset card.

More languages & resources

Open the Hugging Face dataset card for this language for loading instructions, columns, and the latest statistics. The Anv-ke organization lists sibling repos (Dholuo, Kikuyu, Somali, Kalenjin, Maasai). Use only as permitted on the card (research and ASR-related development; no surveillance or unethical profiling).