---
title: "Tshivenda Speech Dataset – 250h Swivuriso ASR Data (Free) | Way With Words"
description: "Over 250 hours of free Tshivenda speech from Swivuriso—scripted and unscripted, first-language speakers—for ASR research and inclusive speech technology."
image: "https://waywithwords.ai/og-default.png"
---

Africa Next Voices

# Tshivenda speech dataset

Part of Swivuriso (ZA-African Next Voices), a large-scale multilingual speech dataset for South African languages. This configuration contains high-quality, first-language Tshivenda speech: over 250 hours of scripted and unscripted audio, collected through ethical community-centred processes. Designed for ASR and inclusive speech technologies. Available free on Hugging Face under CC BY 4.0.

Looking for more options? Browse the [full African speech datasets catalog](/datasets) or see our [community-centric data licensing framework](/esethu).

## Key details

Hours available

250.9

Speakers

104

Access

Available on Hugging Face

Audio format

WAV (48kHz mono)

Accents

South African Tshivenda

[Get dataset on Hugging Face →](https://huggingface.co/datasets/dsfsi-anv/za-african-next-voices)

## Dataset details

Hours available

250.9

Age range

18 - 60+

Download size

Available on Hugging Face

Number of speakers

104

Audio format

WAV (48kHz mono)

Accents

South African Tshivenda

## Additional information

### What is Africa Next Voices?

Africa Next Voices (ANV) is a large-scale initiative supported by the Gates Foundation and a network of research and technology partners to expand high-quality speech datasets for African languages. In South Africa, the project was coordinated by the Data Science for Social Impact (DSFSI) group at the University of Pretoria. Way With Words acted as the data production and workflow partner — designing and running recording, transcription, proofing, and quality control to deliver the South African component.

[Read more about our journey building African Next Voices](/blog/africa-next-voices-project)

### How was this data collected?

The South African ANV dataset combines scripted and unscripted speech. Contributors were recruited from across the country and trained to record in their first language. Recordings were transcribed, proofed, and quality-checked by language specialists. The result is thousands of hours of ethically collected, community-driven speech that reflects how people actually speak — not scraped or synthetic sources.

### How can I use this dataset?

The full multi-language dataset (Swivuriso) is available on Hugging Face. You can load data by language (e.g. isiZulu, isiXhosa, seSotho). Use restrictions apply: the data is not licensed for text-to-speech, voice cloning, or voice synthesis. For research, ASR, and language model training, see the dataset card and license on Hugging Face for full terms.

### Who contributed to this project?

Thousands of South Africans — recorders, proofreaders, and language assistants — gave their time and voices to build this resource. We honoured participants with personalised certificates and fair compensation. For a recorder’s perspective on what it meant to be part of ANV, read [Beyond the Data: Every Voice Carries More Than Words](/blog/beyond-the-data-lenepa-molaoa); for a Language Manager’s perspective on Xitsonga, see [Beyond the Data: The Weight of Being Seen](/blog/beyond-the-data-treasure-makhanye); for how we recognised everyone involved, see [Honouring the Individuals Who Made Africa Next Voices Possible](/blog/africa-next-voices-certificates).

 

## More languages & resources

Swivuriso includes all 7 South African languages. On Hugging Face you can load by language (e.g. zul, xho, sot). Use restrictions apply: not for TTS, voice cloning, or voice synthesis.

[Open on Hugging Face →](https://huggingface.co/datasets/dsfsi-anv/za-african-next-voices) [Back to all datasets](/datasets)

```json
{"@context":"https://schema.org","@type":"Organization","name":"Way With Words AI","url":"https://waywithwords.ai","email":"hello@waywithwords.ai","contactPoint":[{"@type":"ContactPoint","contactType":"customer support","telephone":"+44 208 157 9929","email":"hello@waywithwords.ai","areaServed":"GB","availableLanguage":"en"},{"@type":"ContactPoint","contactType":"customer support","telephone":"+27 21 879 3552","email":"hello@waywithwords.ai","areaServed":"ZA","availableLanguage":"en"}],"location":[{"@type":"Place","name":"Way With Words Limited (UK Office)","address":{"@type":"PostalAddress","streetAddress":"Caledonian House Business Centre, 164 High Street","addressLocality":"Elgin","postalCode":"IV30 1BD","addressCountry":"GB"}},{"@type":"Place","name":"Way With Words SA (Pty) Ltd (South Africa & SADC Office)","address":{"@type":"PostalAddress","streetAddress":"First Floor, Vineyards Square North, The Vineyards Office Estate, 99 Jip de Jager Drive, Bellville","addressLocality":"Cape Town","postalCode":"7530","addressCountry":"ZA"}}]}
{"@context":"https://schema.org","@type":"Dataset","name":"Swivuriso Tshivenda Speech Dataset","description":"Over 250 hours of free Tshivenda speech from Swivuriso—scripted and unscripted, first-language speakers—for ASR research and inclusive speech technology.","url":"https://waywithwords.ai/datasets/anv-tshivenda","license":"CC BY 4.0","creator":{"@type":"Organization","name":"Way With Words"},"keywords":["Tshivenda speech dataset","Swivuriso","African Next Voices","free ASR data"]}
{"@context":"https://schema.org","@type":"BreadcrumbList","itemListElement":[{"@type":"ListItem","position":1,"name":"Home","item":"https://waywithwords.ai/"},{"@type":"ListItem","position":2,"name":"Datasets","item":"https://waywithwords.ai/datasets"},{"@type":"ListItem","position":3,"name":"Tshivenda speech dataset","item":"https://waywithwords.ai/datasets/anv-tshivenda"}]}
```