Kalenjin speech dataset
African Next Voices: Data collection in Kenya (KenCorpus Consortium, Gates Foundation). Scripted and unscripted speech across multiple domains, collected through ethical, community-led processes. CC BY 4.0. See the Hugging Face organization for the latest splits and attribution. This configuration covers Kalenjin (Nandi & Kipsigis). Use the dataset card for the latest release notes.
Looking for more options? Browse the full African speech datasets catalog or see our community-centric data licensing framework.
Key details
- Hours available
- 521
- Speakers
- 0
- Access
- Hugging Face
- Audio format
- WAV (per dataset card)
- Accents
- Kenyan Kalenjin
Dataset details
Hours available
521
Age range
18 - 60+
Download size
Hugging Face
Number of speakers
0
Audio format
WAV (per dataset card)
Accents
Kenyan Kalenjin
Additional information
African Next Voices — Kenya
This listing points to African Next Voices in Kenya (KenCorpus Consortium, Gates Foundation): scripted and unscripted speech collected through community-led processes, with per-language dataset repos under the Anv-ke organization on Hugging Face. The public cards describe domains, splits, transcription coverage, and ethical use; treat releases as work in progress and follow CC BY 4.0 attribution on the dataset card.
More languages & resources
Open the Hugging Face dataset card for this language for loading instructions, columns, and the latest statistics. The Anv-ke organization lists sibling repos (Dholuo, Kikuyu, Somali, Kalenjin, Maasai). Use only as permitted on the card (research and ASR-related development; no surveillance or unethical profiling).