---
title: "Esethu Framework - Community-Centric Data Licensing | Way With Words"
description: "Learn about the Esethu Framework, a community-centric data licensing model for low-resource African language speech datasets. See how equitable governance supports sustainable AI data reuse."
image: "https://waywithwords.ai/og-default.png"
---

Sustainable data governance

# The Esethu Framework for community-centric data licensing

A sustainable data curation framework designed to empower local communities and ensure equitable benefit-sharing from their linguistic resources. It reimagines how low-resource language datasets are created, licensed, and reinvested into future AI systems.

## What is the Esethu Framework?

The Esethu Framework is a sustainable data curation and licensing model that supports repeatable, cost-aware dataset development while giving language communities clear governance over how their data is used. It is supported by the **Esethu License**, a novel community-centric data license.

Developed by [Lelapa AI](https://lelapa.ai/) in collaboration with **Way With Words** and [Data Science for Social Impact (DSFSI)](https://www.cs.up.ac.za/research/dsfsi/), the framework addresses structural inefficiencies in how low-resource language data is sourced and reused—aligning ethical governance with sustainable commercial pathways.

In practice, this is a community-centric data licensing approach for African language AI: it combines transparent governance, accountable reuse, and a reinvestment model so language communities benefit as datasets scale.

Looking for datasets built with these principles? Explore our [African speech datasets catalog](/datasets) and related resources such as the [isiZulu speech dataset](/datasets/isizulu).

## Our contribution

Way With Words contributed the speech data used to develop and validate the Esethu Framework. This data underpins the methodology and experiments described in the research presented at **ACL 2025** and published on arXiv, and it supports the first proof-of-concept dataset released under the framework.

We are proud to partner with Lelapa AI and DSFSI to advance sustainable, community-centred practices for African language AI.

## Key features

### Sustainable licensing

The Esethu License introduces a community-aware commercial pathway: responsible use of language data with reinvestment into future dataset creation, so high-quality data remains available without repeated extraction cycles.

### Community-led development

Local linguists and native speakers lead dataset creation, ensuring authenticity and diversity. The framework safeguards the interests of data creators while bridging resource gaps in ASR for African languages.

### Scalability & replicability

The framework is designed to be applied across multiple low-resource languages, enabling consistent, repeatable dataset development that can scale across regions and use cases.

## Proof of concept: ViXSD

The **Vuk'uzenzele isiXhosa Speech Dataset (ViXSD)** is the first dataset developed under the Esethu Framework and License. It is an open-source ASR corpus of read speech from native isiXhosa speakers, enriched with demographic and linguistic metadata. ViXSD demonstrates how community-driven licensing and curation can support voice-driven applications for isiXhosa while ensuring long-term, ethical data governance.

*   10 hours of high-quality isiXhosa speech data
*   Diverse speakers across dialects, age groups, and regions
*   Ethical licensing that supports future isiXhosa data growth

[View ViXSD on Hugging Face →](https://huggingface.co/datasets/lelapa/Vukuzenzele_isiXhosa_Speech_Dataset_ViXSD)

## Resources & links

Explore the framework, papers, and dataset.

*   [
    
    Framework & research paper (arXiv)
    
    The Esethu Framework: Reimagining Sustainable Dataset Governance and Curation for Low-Resource Languages
    
    ](https://arxiv.org/abs/2502.15916)
*   [
    
    ACL 2025 long paper
    
    Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (ACL 2025), Vienna.
    
    ](https://aclanthology.org/2025.acl-long.1487/)
*   [
    
    Lelapa AI announcement
    
    A Global First: How a New Sustainable Data Framework & License Are Transforming Language AI
    
    ](https://lelapa.ai/a-global-first-how-a-new-sustainable-data-framework-license-are-transforming-language-ai/)
*   [
    
    ViXSD dataset (Hugging Face)
    
    Vuk'uzenzele isiXhosa Speech Dataset — first dataset developed under the Esethu Framework.
    
    ](https://huggingface.co/datasets/lelapa/Vukuzenzele_isiXhosa_Speech_Dataset_ViXSD)
*   [
    
    Esethu License
    
    Community-centric data license supporting equitable benefit-sharing.
    
    ](https://huggingface.co/datasets/lelapa/Vukuzenzele_isiXhosa_Speech_Dataset_ViXSD/blob/main/ESETHU_LICENSE.md)

## Work with us on sustainable data

Interested in datasets under the Esethu Framework or in building ethical, community-centred speech data for other African languages? We'd love to hear from you.

[Get in touch →](/contact)

```json
{"@context":"https://schema.org","@type":"Organization","name":"Way With Words AI","url":"https://waywithwords.ai","email":"hello@waywithwords.ai","contactPoint":[{"@type":"ContactPoint","contactType":"customer support","telephone":"+44 208 157 9929","email":"hello@waywithwords.ai","areaServed":"GB","availableLanguage":"en"},{"@type":"ContactPoint","contactType":"customer support","telephone":"+27 21 879 3552","email":"hello@waywithwords.ai","areaServed":"ZA","availableLanguage":"en"}],"location":[{"@type":"Place","name":"Way With Words Limited (UK Office)","address":{"@type":"PostalAddress","streetAddress":"Caledonian House Business Centre, 164 High Street","addressLocality":"Elgin","postalCode":"IV30 1BD","addressCountry":"GB"}},{"@type":"Place","name":"Way With Words SA (Pty) Ltd (South Africa & SADC Office)","address":{"@type":"PostalAddress","streetAddress":"First Floor, Vineyards Square North, The Vineyards Office Estate, 99 Jip de Jager Drive, Bellville","addressLocality":"Cape Town","postalCode":"7530","addressCountry":"ZA"}}]}
{"@context":"https://schema.org","@type":"TechArticle","headline":"The Esethu Framework: Ethical Data Governance for African Languages","author":{"@type":"Organization","name":"Way With Words AI"},"publisher":{"@type":"Organization","name":"Way With Words AI","logo":{"@type":"ImageObject","url":"https://waywithwords.ai/logo.png"}},"description":"Sustainable data curation and licensing model for speech data in low-resource African languages.","url":"https://waywithwords.ai/esethu"}
```