Language Data Transformation
With TAUS Matching Data, we are transforming vast quantities of parallel language data from different sources and owners into unique, domain-specific corpora tuned to your search requirements.
This tailored approach to data selection significantly reduces the need for high data volumes and finetunes the MT training process.
Simultaneously, Matching Data Library is formed, with the aim to have a more automated supply line for clustered data.
Creation of a Customized Corpus
From Matching Data to Matching Data Library
Matching Data is an effective way to optimize your corpus for training engines that are used on domain-specific language tasks.
Everyone can benefit from good data
Improve the performance of your in-domain MT engines with Matching data
Offer a great service and boost productivity of your language resources with customized corpora
Enlarge your in-domain TMs, increase your efficiency and quality of delivered translations
Since the launch of Data Cloud, TAUS has gathered many insights from the data experts of the industry regarding their needs and expectations. Our solution to fix this data gap is TAUS Matching Data service: a high-performance clustered search methodology, based on data selection techniques.
This white paper provides an overview of the history of language data resourcing as well as challenges associated with it. In efforts to fix the data gap within the translation industry, this white paper also introduces a brand new service: Matching Data.
From product user reviews and blog post comments to everyday business small talk, you will get a wide range of conversational content clustered from several domains - content which will give your MT engine the right tune to handle even the most creative user voices.
Available in five language pairs and three different sizes.