When the Nigerian government announced plans in April to develop a multilingual AI tool to boost digital inclusion across the West African nation, 28-year-old computer science student Lwasinam Lenham Dilli was thrilled.
Dilli had struggled to scrape datasets from the internet to build a large language model (LLM), used to power AI chatbots, in his native Hausa language as part of his final year project at university.
“I needed texts in English and their corresponding translation in Hausa but I couldn’t get anything online. There was no clean data. Creating local language LLMs is a way to ensure our local dialects and languages will not be forgotten or left out of the AI ecosystem,” Dilli told the Thomson Reuters Foundation
The world has been swept up in a whirlwind of AI mania, with tools such as OpenAI’s ChatGPT, Meta’s Llama 2, and Mistral AI captivating millions globally with their ability to generate human-like text.
However, for many tech-savvy African, the excitement has been tempered by a frustrating reality: when languages like Hausa, Amharic, or Kinyarwanda are entered into the chat, many of the advanced systems falter, often producing nonsensical responses. Africa is home to more than 2,000 languages spoken across 54 countries, according to Unesco. However, most African languages remain underrepresented on the internet. English dominates the digital space, accounting for around 50% of all websites, followed by Spanish, German, Japanese and French.
Along with the Nigerian government initiative, there are also a small but growing number of African startups rising to the challenge of developing AI tools in languages such as Swahili, Amharic, Zulu and Sesotho.
In Kenya, for instance, health tech firm Jacaranda Health has pioneered the first LLM operating in Swahili to improve maternal healthcare in East Africa. Built on Meta’s Llama 3 system, UlizaLlama (AskLlama) aims to refine Jacaranda Health’s SMS service for low-income Swahili-speaking expectant mothers who have queries ranging from dietary concerns and foetal movement to exercise during pregnancy.
The platform provides pre-written automated responses, but once UlizaLlama is integrated by the end of June, it will tailor responses to individual needs, offering more detailed pregnancy guidance and emergency support. In South Africa, the Masakhane initiative is using open-source machine learning to translate African languages. Lelapa AI, a South African AI research lab, has pioneered VulaVula, a for-profit language processing tool that translates, transcribes and analyses languages in English, Afrikaans, Zulu and Sesotho.
However, AI experts said building LLMs in African languages poses significant challenges, ranging from availability of data to ethical concerns over consent, compensation and copyright. Many African languages are low-resource languages, meaning there is a scarcity of data to train the models effectively, unlike high-resource languages such as English or French.
Michael Michie, co-founder of Everse Technology Africa, an AI startup building intelligence into data protection and privacy, said collecting the data needed to train LLMs also raised ethical questions. In many African communities oral tradition predominates, and certain communities may not be interested in sharing their language to train LLMs and this should be respected.
“There are no regulations or laws in African countries that address issues related to consent, privacy and compensation to communities when collecting data to train AI tools. This needs to be addressed,” said Michie.
Open-source initiatives such as Creative Commons, which allow creators to legally share their work with specified conditions such as ensuring attribution and non-commercial use, are also not a perfect solution, said some AI experts. However, if everything is open source, it may be harder to properly reimburse and acknowledge the original contributors to these language models, he said.
“A lot of people are working on LLMs because of the prestige. That’s where the money is, but we need to make sure our languages are being taken care of.”
(Source: Thomson Reuters Foundation/ www.sowetanlive.co.za)



