Medical/Pharmaceutical

  Initiator: Universitat Autonoma de Barcelona
  Domain: Medical / Pharmaceutical
 Language(s):
English - Spanish

High fidelity MT training data is always important, even more so when it comes to medical subjects. This is a must-have corpus for anyone seeking for pharma-related data.

It covers product features, dosage/usage recommendations, laboratory analysis and clinical trials for several types of medicines. Parts of the corpora describe the symptoms, diagnostics and treatment of various maladies, patient profiles and common side-effects of medicines used as part of the treatment. Furthermore, the data contains references to various medical regulatory bodies and established regulations.

We created this corpus together with the Universitat Autonoma de Barcelona, based on a bilingual query corpus carefully aligned and validated by the university. This very good query corpus resulted in highly relevant and clean data: "Without a doubt, TAUS Matching Data contributed greatly to increasing both the size and quality of the university corpus", as worded by the Universitat Autnnoma de Barcelona.

Click on the testimonial tab to read their complete testimonial.

To view samples please login.
English - Spanish Tokens
Corpus Size Segments Source Target
Compact 170,000 3,030,415 3,424,891
Medium 270,000 4,809,825 5,434,909
Large 550,000 9,610,319 10,870,918
Sample Login to view
blurred-text
Testimonial from Universitat Autonoma de Barcelona

The Universitat Autonoma de Barcelona has worked with TAUS on a project to gather data from TAUS's Data Cloud platform. The process consisted in the University supplying TAUS with a corpus of approximately 40K English to Spanish strings, which consisted of data from the European Medicines Agency (UE) and from the FDA database aligned by a student. The texts compiled in this corpus were summaries of product characteris and summaries for the public of several medicines.

TAUS used this corpus to explore its Data Cloud for similarity in the pair of languages considered and reverted back with data output which appropriate score ranges on similarity and proximity. The university then performed an assessment of the quality of the output based on the field and degree of specialisation of the data.

After this review, the university concluded that the data were of high quality and relevance to the pharmaceutical and technical field of the original corpus. Without a doubt, TAUS Matching Data contributed greatly to increasing both the size and quality of the university corpus.

The Universitat Autonoma de Barcelona is very grateful to TAUS for supporMng the academic research being carried out by one of its students in the field of machine translation, more precisely in the training of a machine translaMon engine specialised in pharmaceuticals. We hope to be able to collaborate with them again soon, given the good treatment received and the quality and commitment of TAUS with the quality of its services.

Language Pair
Compact
Medium
Large
English - Spanish
Member Price
Price in Euro / Partner Credits
Price in Data Cloud Credits
€ 5,700
13 million
€ 7,000
15 million
€ 9,300
20 million
Non-Member Price
Price in Euro
€ 7,125
€ 8,750
€ 11,625

Couldn't find what you were looking for?

Do you have a query corpus to submit?
Request Matching Data
Contact us to get more information
Contact us