Powering Automated Translation in Time of Corona Crisis
Machine translation is an important technology in the event of a crisis. When integrated in the rapid-response communication plans it increases the speed with which the information is passed on, but also the language coverage. The one pre-condition is that there is enough data available on the topic at hand.
TAUS Corona Crisis Corpora
These corpora are the result of a collective industry charity effort where participants contributed their own translation memories covering this domain so that together we were able to expand both the volume of good data and the language spread. TAUS also generated corpora by applying Matching Data selection to DataCloud and ParaCrawl data. The query corpus used is crawled from the web for the latest Corona virus-related articles and news. The selected data is related to virology, epidemic, medicine, and healthcare.
Each file contains two tab-separated columns: the first column is source text and the second is the target. Anyone who is training their own MT engines can download these corpora and use them to improve their translation services and systems. ModelFront helped in filtering the corpora further, and removed misaligned or bad translations.
|Language Pair||Segment Count|
Corona Crisis Translation models by SYSTRAN
SYSTRAN has contributed to this initiative by producing Corona Crisis Translation Models in 12 languages, based on quality parallel data provided by TAUS. The models are publicly available at no cost. Together, we ensure that people and communities in need have access to accurate coronavirus-related information in their local language.
Try the models for free on SYSTRAN Translate
If you would like to contribute data, please contact the TAUS Data team at firstname.lastname@example.org