Colloquial

  Initiator: Oracle
  Domain: Colloquial Text
 Language(s):
English - Spanish English - Portuguese (Brazil) English - Chinese (PRC) English - Korean English - Japanese English - French English - Dutch English - German English - Russian

Is your chat bot not chatty enough? Or your MT engine looks at you puzzled when it has to deal with informal business communication or user generated content? This corpus will give the conversation with your local audience a friendly, casual tone.

From product user reviews and blog post comments to everyday business small talk, you will get a wide range of conversational content clustered from several domains - content which will give your MT engine the right tune to handle even the most creative user voices.

This corpus was created in cooperation with Oracle, and the output of TAUS Matching Data was scored with an 84% average acceptance rate by their linguists!

Click on the testimonial tab to read the complete testimonial from Oracle.

To view samples please login.
English - Spanish Tokens
Corpus Size Segments Source Target
Compact 1,392,336 13,338,720 13,215,435
Medium 2,367,007 23,033,154 22,886,464
Large 3,152,505 30,974,622 30,855,332
Sample Login to view
blurred-text
English - Portuguese (Brazil) Tokens
Corpus Size Segments Source Target
Compact 1,458,029 10,063,834 9,721,101
Medium 4,961,937 36,345,154 34,955,263
Large 7,609,709 57,413,608 55,138,242
Sample Login to view
blurred-text
English - Chinese (PRC) Tokens
Corpus Size Segments Source Target
Compact 1,776,915 15,340,028 17,123,378
Medium 6,953,223 62,685,206 69,561,153
Large 11,935,793 110,397,752 122,209,117
Sample Login to view
blurred-text
English - Korean Tokens
Corpus Size Segments Source Target
Compact 523,439 5,271,710 4,158,054
Medium 1,061,426 11,178,860 8,776,202
Large 1,635,961 17,714,974 13,873,595
Sample Login to view
blurred-text
English - Japanese Tokens
Corpus Size Segments Source Target
Compact 498,661 4,949,219 7,606,297
Medium 1,847,027 20,834,383 32,024,856
Large 3,090,922 36,836,235 56,522,064
Sample Login to view
blurred-text
English - French Tokens
Corpus Size Segments Source Target
Compact 670,460 7,151,963 7,849,059
Medium 1,054,486 11,578,189 12,804,660
Large 1,360,095 15,169,296 16,845,800
Sample Login to view
blurred-text
English - Dutch Tokens
Corpus Size Segments Source Target
Compact 569,524 6,741,871 6,699,254
Medium 831,077 10,109,065 10,077,631
Large 1,143,054 14,149,189 14,127,685
Sample Login to view
blurred-text
English - German Tokens
Corpus Size Segments Source Target
Compact 1,218,472 12,386,428 11,721,260
Medium 2,010,850 20,886,816 19,767,517
Large 2,636,019 27,766,143 26,282,966
Sample Login to view
blurred-text
English - Russian Tokens
Corpus Size Segments Source Target
Compact 601,274 4,731,609 4,290,998
Medium 905,773 7,509,962 6,830,191
Large 1,130,809 9,667,738 8,805,302
Sample Login to view
blurred-text
Testimonial from Oracle

Oracle International Product Solutions has worked with TAUS on a joint pilot project to enable data discovery within TAUS's Data Cloud corpora. The process consisted in Oracle IPS supplying TAUS with a sample of approximately 30K English strings, representing content that is aligned to Oracle projects.

TAUS used the sample to explore Data Cloud for similarity & proximity, across 5 languages, and reverted back with three categories of data output, with score ranges on similarity and proximity. Oracle IPS then performed a linguistic assessment of this output. Our in-depth linguistic review rendered positive results and the content supplied by TAUS was of good quality, appropriate to consume as aligned corpora to that supplied in the Oracle sample with an average score of 84% for across the 5 languages.

Oracle IPS will continue to work with TAUS to assess the effect that consuming this discovered corpora will have on engine quality. We look forward to having data search and discovery features on Data Cloud, whereby a user is capable of discovering their own project aligned content as a consumable self-service. We believe this will allow TAUS and its members to drive increased value from the TAUS data assets and in turn will likely continue to fuel growth in the pool of data and value-add services.

Language Pair
Compact
Medium
Large
English - Spanish
Member Price
Price in Euro / Partner Credits
Price in Data Cloud Credits
€ 11,600
26 million
€ 16,600
37 million
€ 18,500
41 million
Non-Member Price
Price in Euro
€ 13,920
€ 19,920
€ 22,200
English - Portuguese (Brazil)
Member Price
Price in Euro / Partner Credits
Price in Data Cloud Credits
€ 9,300
20 million
€ 17,500
39 million
€ 18,200
40 million
Non-Member Price
Price in Euro
€ 11,160
€ 21,000
€ 21,840
English - Chinese (PRC)
Member Price
Price in Euro / Partner Credits
Price in Data Cloud Credits
€ 10,000
22 million
€ 15,000
33 million
€ 22,500
50 million
Non-Member Price
Price in Euro
€ 12,000
€ 18,000
€ 27,000
English - Korean
Member Price
Price in Euro / Partner Credits
Price in Data Cloud Credits
€ 5,100
11 million
€ 8,800
20 million
€ 10,100
22 million
Non-Member Price
Price in Euro
€ 6,120
€ 10,560
€ 12,120
English - Japanese
Member Price
Price in Euro / Partner Credits
Price in Data Cloud Credits
€ 9,000
20 million
€ 13,500
30 million
€ 20,250
45 million
Non-Member Price
Price in Euro
€ 10,800
€ 16,200
€ 24,300
English - French
Member Price
Price in Euro / Partner Credits
Price in Data Cloud Credits
€ 8,100
18 million
€ 10,700
24 million
€ 11,600
26 million
Non-Member Price
Price in Euro
€ 9,720
€ 12,840
€ 13,920
English - Dutch
Member Price
Price in Euro / Partner Credits
Price in Data Cloud Credits
€ 7,400
16 million
€ 9,100
20 million
€ 10,300
23 million
Non-Member Price
Price in Euro
€ 8,880
€ 10,920
€ 12,360
English - German
Member Price
Price in Euro / Partner Credits
Price in Data Cloud Credits
€ 10,600
23 million
€ 14,800
33 million
€ 16,300
36 million
Non-Member Price
Price in Euro
€ 12,720
€ 17,760
€ 19,560
English - Russian
Member Price
Price in Euro / Partner Credits
Price in Data Cloud Credits
€ 5,800
13 million
€ 7,100
16 million
€ 7,500
17 million
Non-Member Price
Price in Euro
€ 6,960
€ 8,520
€ 9,000

Couldn't find what you were looking for?

Do you have a query corpus to submit?
Request Matching Data
Contact us to get more information
Contact us