TAUS Matching Data

Select language data tuned to your own domain from the largest industry-shared language data repository using the TAUS high-performance clustered search technology

Learn More

Language Data Transformation

With TAUS Matching Data, we are transforming vast quantities of parallel language data from different sources and owners into unique, domain-specific corpora tuned to your search requirements.
This tailored approach to data selection significantly reduces the need for high data volumes and finetunes the MT training process.
Simultaneously, Matching Data Library is formed, with the aim to have a more automated supply line for clustered data.

Creation of a Customized Corpus

From Matching Data to Matching Data Library
Matching Data is an effective way to optimize your corpus for training engines that are used on domain-specific language tasks.

Everyone can benefit from good data

Enterprises

Improve the performance of your in-domain MT engines with Matching data

Service Providers

Offer a great service and boost productivity of your language resources with customized corpora

Language Professionals

Enlarge your in-domain TMs, increase your efficiency and quality of delivered translations

Related Content

Since the launch of Data Cloud, TAUS has gathered many insights from the data experts of the industry regarding their needs and expectations. Our solution to fix this data gap is TAUS Matching Data service: a high-performance clustered search methodology, based on data selection techniques.

This white paper provides an overview of the history of language data resourcing as well as challenges associated with it. In efforts to fix the data gap within the translation industry, this white paper also introduces a brand new service: Matching Data.

From product user reviews and blog post comments to everyday business small talk, you will get a wide range of conversational content clustered from several domains - content which will give your MT engine the right tune to handle even the most creative user voices.

Available in five language pairs and three different sizes.

500x500