High fidelity MT training data is always important, even more so when it comes to medical subjects. This is a must-have corpus for anyone seeking for pharma-related data.
It covers product features, dosage/usage recommendations, laboratory analysis and clinical trials for several types of medicines. Parts of the corpora describe the symptoms, diagnostics and treatment of various maladies, patient profiles and common side-effects of medicines used as part of the treatment. Furthermore, the data contains references to various medical regulatory bodies and established regulations.
We created this corpus together with the Universitat Autonoma de Barcelona, based on a bilingual query corpus carefully aligned and validated by the university. This very good query corpus resulted in highly relevant and clean data: "Without a doubt, TAUS Matching Data contributed greatly to increasing both the size and quality of the university corpus", as worded by the Universitat Autnnoma de Barcelona.
The Universitat Autonoma de Barcelona has worked with TAUS on a project to gather data from TAUS's Data Cloud platform. The process consisted in the University supplying TAUS with a corpus of approximately 40K English to Spanish strings, which consisted of data from the European Medicines Agency (UE) and from the FDA database aligned by a student. The texts compiled in this corpus were summaries of product characteris and summaries for the public of several medicines.
TAUS used this corpus to explore its Data Cloud for similarity in the pair of languages considered and reverted back with data output which appropriate score ranges on similarity and proximity. The university then performed an assessment of the quality of the output based on the field and degree of specialisation of the data.
After this review, the university concluded that the data were of high quality and relevance to the pharmaceutical and technical field of the original corpus. Without a doubt, TAUS Matching Data contributed greatly to increasing both the size and quality of the university corpus.
The Universitat Autonoma de Barcelona is very grateful to TAUS for supporMng the academic research being carried out by one of its students in the field of machine translation, more precisely in the training of a machine translaMon engine specialised in pharmaceuticals. We hope to be able to collaborate with them again soon, given the good treatment received and the quality and commitment of TAUS with the quality of its services.