When Kareem Darwish’s daughter started reading, he wanted her to read Arabic stories and books so she could develop an interest in Arabic literature. However, Darwish struggled to find Arabic resources online or in bookstores.
“I really wanted her to start reading in Arabic, but she would always say that there’s nothing interesting to be found,” says Dr. Darwish, Senior Scientist at Qatar Computing Research Institute (QCRI). “There isn’t a lot of good literature [in Arabic] for children and teens, compared to the various titles published in English.
“I specifically sent someone abroad to dig out some of the books I had read as a kid – really old mystery books, like The 13 Devils, The Five Adventurers, and others that were published more than 40 years ago – and I got them for her. Now she’s addicted to them!”
I really wanted my daughter to start reading in Arabic, but she would always say that there’s nothing interesting to be found.
Dr. Darwish works within a team at QCRI, part of Qatar Foundation member Hamad Bin Khalifa University (HBKU), to develop tools that revive the Arabic language and promote its use in the digital world. The team of researchers, scientists, engineers, and product developers – called the Arabic Language Technologies (ALT) group – are spearheading efforts to solve challenges facing the Arabic language as information and experiences become increasingly digitized.
With the goal of ensuring a fair online representation of the Arabic language and culture while maximizing the digital experiences of the language’s speakers, the team works to address the shortage of online Arabic content and the lack of technical tools that account for the nuances of the Arabic language.
Creating a better reading experience
One challenge the team works to address is the absence of tools that promote reading in Arabic. “We realized that there wasn’t an e-reader that fully supports Arabic text,” explains Majd Abbar, Director of Commercialization and Business Development at QCRI and a member of the ALT group. “This includes the ability to alternate between right-to-left and left-to-right interfaces, since Arabic is written from right-to-left while its numbers are written from left-to-right.”
Jalees (an Arabic word that means ‘companion’), is an e-reader developed at QCRI that fully supports reading Arabic text. “It started as a research venture, then the Ministry of Education and Higher Education in Qatar launched their e-bag initiative that includes a collection of tech tools and applications for students and teachers, and we saw the potential of collaborating with them to give students all over Qatar a better reading experience, especially when it comes to reading in Arabic.”
Designed to provide children and teens with stimulating and interactive activities, the e-reader has integrated games, videos, and simulations within books. “We went to classes and saw how students interacted with the application,” says Abbar, “and their happiness and enthusiasm encouraged all of us.”
Bridging the content gap
To further increase the quantity and enrich the quality of digital Arabic content, the team has launched the Ethraa project, a tool designed to professionally translate digital content into Arabic.
“Most Arabic speakers experience how quality content in Arabic is extremely lacking, compared to other languages on the web,” Abbar continues. “Arabic speakers are the fourth major linguistic group of internet users, but this is not translated into actual content.”
“That’s where the idea of Ethraa came about. We’ve collaborated with Wikimedia to start translating articles into Arabic, and we’ve managed to professionally translate more than 10,000 articles.”
While working on Wikipedia articles, the team was confronted with a more pressing challenge. “We saw that credible medical information in Arabic is almost non-existent online,” explains Abbar. “We’ve worked to obtain a license to translate the digital content from the Mayo Clinic [a non-profit academic and medical center ranked among the top hospitals in the United States] into Arabic. We’ve also collaborated with Arab doctors to review and translate the materials.”
We’ve collaborated with Wikimedia to start translating articles into Arabic, and we’ve managed to professionally translate more than 10,000 articles.
The team of aspiring researchers didn’t stop at enriching the digital content in Arabic. They wanted to build tools to allow for better integration of the Arabic script and word complexity in newly developed programs and applications as well as search engines.
Building language-specific tech tools
Sometimes, one word in Arabic can be translated into a full sentence in other languages. The single word "أَنُلْزِمُكُمُوهَا" translates to a sentence of seven words in English: “Shall we compel you to accept it?”
One word in Arabic can also have more than 100 derivations, made by prefixes and suffixes, with each one carrying part of the meaning. Many tech tools developed today can’t process these words to elicit the large amount of information contained in them. Failure to account for this complexity yields poorer results for searches done in Arabic on the web, compared to other languages.
“Many companies in the Middle East, like Microsoft Egypt or Sakhr Software, are building Arabic text processing tools,” says Dr. Hamdy Mubarak, Senior Software Engineer at the ALT group, “but their use is kept within the company itself. At QCRI, we are building an open tool called Farasa that people can not only use but also download, look at the source code, see how it works from the inside, and improve upon the existing algorithms.”
Farasa (an Arabic word that means “insight”) is one of the few Arabic text processing tools available online that could be embedded into many other services and tools, such as text-to-speech software, search engines, machine translation, and social media analysis – paving the way for developers to take into consideration the pool of differences between languages, especially those that follow different structures and word formation rules.
At QCRI, we are building an open tool called Farasa that people can not only use but also download, look at the source code, see how it works from the inside, and improve upon the existing algorithms.
The road ahead
In order to build meaningful connections worldwide and ensure equal access to information in the digital world, modern technologies and solutions need to be utilized to solve challenges facing our local language and culture.
Dr. Darwish points to one significant challenge that, according to him, is the root of the problem, and also the beginning of a solution: “Arabic is slowly and continuously being detached from the business and daily needs of people; you can live in an Arab country and never feel the need to learn the language,” he says.
“Creating that need is what will drive people to adopt the language in their everyday life and push them to account for it in products deployed in the huge market we have in the Arab world.”
Many people agree that the Arabic language is not going to simply disappear because of its rich heritage and ties to the Arab world, but the question remains: will it stay relevant in an accelerating digital world?