TAUS Matching Data

Get clean, high-quality, high-fidelity datasets for MT training, tuned to your specific domain and content type.

Learn More

Language Data Transformation

With TAUS Matching Data high-performance search technology, we are transforming vast quantities of parallel language data from different sources and owners into unique, domain-specific corpora tuned to your search requirements.
We can match your data requirements by selecting relevant segments from the largest industry-shared language data repository (TAUS Data) or clean and cluster your legacy language data in a private cloud environment to make it fit for MT training.

TAUS Matching Data as a service
Want to create your own domain-specific corpus?
TAUS Matching Data items
Want to explore ready-made corpora?

Creation of a Customized Corpus

From Matching Data to Matching Data Library
Matching Data is an effective way to optimize your corpus for training engines that are used on domain-specific language tasks.

Everyone benefits from good data


Improve the performance of your in-domain MT engines with Matching data

Service Providers

Offer a great service and boost productivity of your language resources with customized corpora

Language Professionals

Enlarge your in-domain TMs, increase your efficiency and quality of delivered translations

Data News

This white paper provides an overview of the downloads of language data resourcing as well as challenges associated with it. In efforts to fix the data gap within the translation industry, this white paper also introduces a brand new service: Matching Data.

While generic engines perform well with a general-purpose text, a machine translating text in a particular linguistic domain will give the best results when trained with a customized data set, carefully selected to cover the vocabulary and semantic specificity of the content.

Translation inherently triggers questions about intellectual property rights and data protection laws. As the translation ecosystem is complex, it is not easy to draw simple conclusions on who is responsible for what and which use cases are legitimate or not. Expert advice is required. Therefore, TAUS in collaboration with Baker McKenzie Law Firm produced this white paper as a blue print for the translation industry.