Matching Data as a Service

Matching Data is a clustered search technology applied to the TAUS Data Cloud repository and to web-crawled data. Matching Data uses an example data set and returns matches according to relevance on a segment level across files and domains.
With this methodology developers of MT engines can create high fidelity data sets tuned to their own domains.
This new approach is based on DatAptor, a joint research project between the University of Amsterdam, TAUS, Intel and EC DGT.

Domain-specific or cross-domain matching
Tailored to your search requirements
Clustered search within TAUS Data Cloud or on TAUS crawled data
10% discount for initiators of a new corpus

Here is how it works

  • 1

    Query corpus submission

    User provides a query corpus and a profile of the data they are looking for (domain name, languages, domain description)

  • 2

    Data matching

    Based on a query corpus the best matching data in the TAUS Data Cloud is identified, on a segment-level basis

  • 3

    Selection creation

    Data selections are created, with different matching rates (Compact, Medium, Large).

  • 4

    Selection review and choice

    The user chooses the most fitting match rate(s) and languages

  • 5

    Payment and download

    After the payment, the data is ready for download

How to get started?

Do you have a query corpus to submit?
Request Matching Data
Contact us to get more information
Contact us