Speech dataset

Afrikaans speech dataset

Afrikaans conversational speech dataset developed for speech recognition, conversational AI, and evaluation use cases. Featuring balanced gender representation and domain-driven prompts, the collection reflects natural language usage across multiple real-world scenarios.

Key details

Hours available
50 hours
Speakers
46
Download size
38GB
Audio format
WAV
Accents
Afrikaans

Dataset details

Hours available

50 hours

Age range

18 – 69

Download size

38GB

Number of speakers

46

Audio format

WAV

Accents

Afrikaans

Dataset demographics

Age range distribution

Recorders per age group

  • [18 – 29] 9 Recorders
  • [30 – 49] 28 Recorders
  • [50 – 69] 9 Recorders

Gender split across recorded hours

Recorders per gender

  • Men 20 Recorders
  • Women 26 Recorders

Hours collected across domains

Runtime per domain

  • Retail 12:21:57
  • Debt Collection 12:19:33
  • Insurance 12:19:23
  • Travel 13:02:28

Additional information

How are dataset recordings structured?

Our off-the-shelf dataset collections comprise unscripted, natural conversations conducted by call recorders recruited, trained, and approved to simulate real-world conversations in common domains. Recordings and transcripts include routine security verifications such as ID, email, and phone number validation.

How do you recruit for speech collection datasets?

Our priority is to create datasets that are unbiased and cover as wide a range of demographics as possible. That is the first consideration when we begin the planning and recruitment process of any speech collection dataset project.

What kind of agreement is in place for the purchase of this dataset?

A Licence Agreement governs the sale and usage of this speech collection dataset. Our off-the-shelf options are available for clients to test and benchmark before larger, custom commitments can be considered that are better suited to client requirements and conventions.

Need a different dataset?

We can design and deliver bespoke speech collections for your languages, domains, and scale. Tell us what you need and we'll get back within 1–2 business days.