Africa Next Voices

Dholuo speech dataset

African Next Voices: Data collection in Kenya (KenCorpus Consortium, Gates Foundation). Scripted and unscripted speech across multiple domains, collected through ethical, community-led processes. CC BY 4.0. See the Hugging Face organization for the latest splits and attribution. This configuration covers Dholuo (Nyandwat, Milambo dialects). Use the dataset card for transcription status, splits, and ethical use terms.

Looking for more options? Browse the full African speech datasets catalog or see our community-centric data licensing framework.

Key details

Hours available
723
Speakers
0
Access
Hugging Face
Audio format
WAV (per dataset card)
Accents
Kenyan Dholuo
Get dataset on Hugging Face

Dataset details

Hours available

723

Age range

18 - 60+

Download size

Hugging Face

Number of speakers

0

Audio format

WAV (per dataset card)

Accents

Kenyan Dholuo

Additional information

African Next Voices — Kenya

This listing points to African Next Voices in Kenya (KenCorpus Consortium, Gates Foundation): scripted and unscripted speech collected through community-led processes, with per-language dataset repos under the Anv-ke organization on Hugging Face. The public cards describe domains, splits, transcription coverage, and ethical use; treat releases as work in progress and follow CC BY 4.0 attribution on the dataset card.

More languages & resources

Open the Hugging Face dataset card for this language for loading instructions, columns, and the latest statistics. The Anv-ke organization lists sibling repos (Dholuo, Kikuyu, Somali, Kalenjin, Maasai). Use only as permitted on the card (research and ASR-related development; no surveillance or unethical profiling).