Language Data Transformation
With TAUS Matching Data high-performance search technology, we are transforming vast quantities of parallel language data from different sources and owners into unique, domain-specific corpora tuned to your search requirements.
We can match your data requirements by selecting relevant segments from the largest industry-shared language data repository (TAUS Data) or clean and cluster your legacy language data in a private cloud environment to make it fit for MT training.
Creation of a Customized Corpus
From Matching Data to Matching Data Library
Matching Data is an effective way to optimize your corpus for training engines that are used on domain-specific language tasks.
Everyone benefits from good data
Improve the performance of your in-domain MT engines with Matching data
Offer a great service and boost productivity of your language resources with customized corpora
Enlarge your in-domain TMs, increase your efficiency and quality of delivered translations
This white paper provides an overview of the downloads of language data resourcing as well as challenges associated with it. In efforts to fix the data gap within the translation industry, this white paper also introduces a brand new service: Matching Data.
While generic engines perform well with a general-purpose text, a machine translating text in a particular linguistic domain will give the best results when trained with a customized data set, carefully selected to cover the vocabulary and semantic specificity of the content.
Translation inherently triggers questions about intellectual property rights and data protection laws. As the translation ecosystem is complex, it is not easy to draw simple conclusions on who is responsible for what and which use cases are legitimate or not. Expert advice is required. Therefore, TAUS in collaboration with Baker McKenzie Law Firm produced this white paper as a blue print for the translation industry.