---
title: "2026 Dataset Vote Results | Way With Words AI"
description: "Thank you to everyone who voted and shared our 2026 community dataset campaign. See final results and what we are building next."
image: "https://waywithwords.ai/og-default.png"
---

2026 community vote, concluded

# Thank you for shaping our roadmap

Voting for our first community dataset campaign closed on 15 May 2026. Thank you to everyone who cast a vote, and to everyone who shared the page with colleagues, students, and partners. Your input directly guides what we build next.

### What happens next

We are now reviewing feasibility and production paths for the top-voted datasets, starting with the community's first priority below. We will share updates as projects move forward.

*   Assessing technical scope, ethics, and resourcing for the highest-ranked proposals.
*   Engaging partners where there is strong institutional interest or funding alignment.
*   Publishing progress on this page and through our usual channels as work advances.

If your organisation can contribute time, expertise, or funding toward a listed dataset, [we would love to hear from you](/contact).

Final results

20 votes total

Community priorities from our 2026 vote. We are exploring how to make these datasets a reality.

#1 community priority

Camfranglais Conversational Speech and Annotation Dataset (CCSAD)

Camfranglais

5 votes

#2 community priority

Multilingual Multi-Speaker Conversational Corpus

ENG, AFR, ZUL, SOT, XHO

4 votes

#3 community priority

Code-Switching Conversational Corpus

SWA, YOR, HAU

4 votes

[View final results](#proposed-collections) [Suggest a dataset](#suggest-dataset)

Voting concluded · Final results below

Language coverage

See the interactive [Languages across Africa](/african-languages) map for countries and dataset coverage for languages in each zone.

[Open interactive map](/african-languages)

## Proposed collections

Final vote counts from our 2026 campaign (20 votes total). Sorted by community support.

Showing 1-7 of 7 dataset options.

Option 1

Conversational Speech (Code-switching + Annotation)

#### Camfranglais Conversational Speech and Annotation Dataset (CCSAD)

Community votes 5 votes (25%)

Structured Camfranglais conversational speech dataset capturing natural code-switching, lexical innovation, and context-dependent meaning in real usage contexts.

Type

Conversational Speech (Code-switching + Annotation)

Domain

Urban Conversations, Sociolinguistics, Digital Humanities

Languages

Camfranglais

Deliverables

Recordings + Transcripts

Objective: Build a research-oriented, ethically collected Camfranglais corpus that documents naturally occurring hybrid speech with layered annotations for linguistic research, preservation, and future low-resource language technology work.

View technical specification

Option 2

Conversational Speech (Multi-speaker)

#### Code-Switching Conversational Corpus

Community votes 4 votes (20%)

Multi-speaker conversational dataset focused on natural code-switching across Swahili, Yoruba, and Hausa for robust multilingual dialogue modeling.

Type

Conversational Speech (Multi-speaker)

Domain

Customer Support, Healthcare, Agriculture

Languages

SWA, YOR, HAU

Deliverables

Recordings + Transcripts

Objective: Build a high-quality conversational corpus that captures realistic turn-taking and intra-utterance code-switching patterns across three widely used African languages.

View technical specification

Option 3

Conversational Speech (Multi-speaker)

#### Multilingual Multi-Speaker Conversational Corpus

Community votes 4 votes (20%)

Multi-speaker conversational recordings designed for realistic dialogue behaviour, sentiment analysis, and robust multilingual speech modeling with code-switching.

Type

Conversational Speech (Multi-speaker)

Domain

Customer Support, FinTech, Telecom

Languages

ENG, AFR, ZUL, SOT, XHO

Deliverables

Recordings + Transcripts

Objective: Create a high-coverage multilingual conversational dataset with natural turn-taking and domain-relevant interactions for speech and language model training.

View technical specification

Option 4

Parallel Speech (Scripted + Translated)

#### Parallel Multilingual Speech Translation Corpus (Healthcare)

Community votes 3 votes (15%)

Parallel healthcare speech corpus with aligned multilingual scripts to improve ASR, speech translation, and cross-lingual instruction fidelity.

Type

Parallel Speech (Scripted + Translated)

Domain

Healthcare

Languages

ENG, AFR, ZUL, SOT, XHO

Deliverables

Recordings + Transcripts + Translations

Objective: Create a high-quality parallel speech dataset with semantic alignment across 5 languages for multilingual, voice-to-voice healthcare communication tasks.

View technical specification

Option 5

Multimodal (Image + Speech Prompting)

#### Multilingual Multimodal South African Culture Corpus

Community votes 2 votes (10%)

Rights-cleared cultural image corpus paired with multilingual spoken prompts to support vision-language grounding in underrepresented local contexts.

Type

Multimodal (Image + Speech Prompting)

Domain

South African culture

Languages

ENG, AFR, ZUL, SOT, XHO

Deliverables

Recordings + Transcripts + Images

Objective: Build a legally compliant multimodal benchmark where each image is paired with language-specific spoken prompts for model training and evaluation.

View technical specification

Option 6

Conversational Speech (Multi-speaker)

#### Underrepresented SA Languages Conversational Corpus

Community votes 2 votes (10%)

Multi-speaker conversational dataset focused on underrepresented South African languages with natural code-switching patterns for robust multilingual speech modeling.

Type

Conversational Speech (Multi-speaker)

Domain

Customer Support, FinTech, Education

Languages

TSN, NSO, SSW, VEN, NBL, TSO

Deliverables

Recordings + Transcripts

Objective: Build a high-quality conversational corpus that captures natural turn-taking and code-switching behaviour across six underrepresented South African languages.

View technical specification

Option 7

LLM Q&A (Instructional / How-to)

#### Everyday Language Instruction Benchmark (Q&A)

Community votes 0 votes (0%)

A culturally grounded Shona instruction and Q&A dataset for practical everyday guidance, designed to support localised LLM training and evaluation.

Type

LLM Q&A (Instructional / How-to)

Domain

Daily Life, Home and Family, Education, Health Access, Micro-business

Languages

Shona

Deliverables

Validated Q&A Pairs

Objective: Build a domain-balanced Shona instruction and Q&A resource for practical everyday guidance, with approximately 1,500 validated pairs written in locally natural phrasing to support localised LLM evaluation and fine-tuning.

View technical specification

### Have another dataset idea?

Suggest a new dataset for internal review. You can choose multiple languages and dataset types.

Suggest a dataset

Suggestions are reviewed by the Way With Words team for feasibility, impact, and potential funding alignment before being added as a voting option.

Website 

Name 

Surname 

Email 

Name of company or learning institution 

Suggested dataset title 

Proposed domain(s) (optional) 

Minimum number of hours to record (optional) 

What is your use case for this dataset?

Would you contribute funding or help find funding for this dataset? Select one Yes, we can contribute funding Yes, we can contribute time / expertise Yes, we can help find funding partners Yes, both Not at this stage

African language(s) (select one or more)

 Afrikaans Amharic  Arabic (North Africa)  Berber (Tamazight)  Chichewa  English (South African)  Fang  Fulfulde  Hausa  Igbo  isiNdebele  isiXhosa  isiZulu  Kituba  Kinyarwanda  Lingala  Luganda  Oromo  Sango  seSotho  seTswana  Shona  Somali  Swahili  Tigrinya  Tshiluba  Tshivenḓa  Xitsonga  Yoruba

Dataset type(s) (select one or more)

 Multi-modal Text  LLM Q&A (Instructional / How-to)  Voice  Parallel languages  Multi-speaker  Monolingual  Bilingual  Domain-specific  Conversational  Read speech

Additional notes (optional)

By submitting, you agree that Way With Words may use the information you provide to review your dataset suggestion and understand language and dataset demand. See our [Privacy Notice](/privacy).

Send suggestion

 

Technical spec

Close

Languages

Deliverables

Primary ML use cases

Quality and evaluation focus

[Talk to us about your dataset idea](/contact#contact-form)

```json
{"@context":"https://schema.org","@type":"Organization","name":"Way With Words AI","url":"https://waywithwords.ai","email":"hello@waywithwords.ai","contactPoint":[{"@type":"ContactPoint","contactType":"customer support","telephone":"+44 208 157 9929","email":"hello@waywithwords.ai","areaServed":"GB","availableLanguage":"en"},{"@type":"ContactPoint","contactType":"customer support","telephone":"+27 21 879 3552","email":"hello@waywithwords.ai","areaServed":"ZA","availableLanguage":"en"}],"location":[{"@type":"Place","name":"Way With Words Limited (UK Office)","address":{"@type":"PostalAddress","streetAddress":"Caledonian House Business Centre, 164 High Street","addressLocality":"Elgin","postalCode":"IV30 1BD","addressCountry":"GB"}},{"@type":"Place","name":"Way With Words SA (Pty) Ltd (South Africa & SADC Office)","address":{"@type":"PostalAddress","streetAddress":"First Floor, Vineyards Square North, The Vineyards Office Estate, 99 Jip de Jager Drive, Bellville","addressLocality":"Cape Town","postalCode":"7530","addressCountry":"ZA"}}]}
```