This document presents three case studies that integrate machine translation (MT) with translation memory (TM) to increase translation productivity. It finds that combining MT and TM can increase productivity beyond what is possible with either technology alone. The key factors that influence productivity gains are the quality and quantity of TM entries, training MT on in-domain text, the language pair being translated, and the user's expertise with the various technologies. Segment-by-segment and one-time MT application workflows are tested using different CAT tool and MT engine combinations. Productivity increases are observed for translating repetitive text such as instructions.
1 of 18
Downloaded 46 times
More Related Content
Integrating Machine Translation with Translation Memory: A Practical Approach
1. Introduction
Methodology
Discussion
Integrating Machine Translation with Translation
Memory: A Practical Approach
Panagiotis Kanavos and Dimitrios Kartsaklis
November 4, 2010
Panagiotis Kanavos and Dimitrios Kartsaklis Integrating MT with TM: A Practical Approach 1/ 18
2. Introduction
Methodology
Discussion
Introduction
Despite the ongoing research and the progress on the 鍖eld,
Machine Translation has not been widely accepted by the
professional translation industry
Common criticisms:
MT is only suitable for draft translations of e-mails and web
pages
MT is not e鍖cient for morphologically rich languages
MT is useful only to large companies owning a wealth of
resources
In a nutshell: MT is something for researchers to play around
with
Panagiotis Kanavos and Dimitrios Kartsaklis Integrating MT with TM: A Practical Approach 2/ 18
3. Introduction
Methodology
Discussion
A Case Study
How MT can be incorporated into professional translation
work鍖ows, with limited resources, in ways that signi鍖cantly
increase productivity.
We combine both statistical and rule-based MT systems with
Translation Memory software using two approaches:
The on demand, sentence-by-sentence application of MT
The one-time application of MT into the whole translation
project
The case study is conducted in production conditions, with
鍖nal deliverables that require the highest translation quality.
Panagiotis Kanavos and Dimitrios Kartsaklis Integrating MT with TM: A Practical Approach 3/ 18
4. Introduction Con鍖guration
Methodology Segment-by-segment work鍖ows
Discussion One-time MT application work鍖ow
Our setting
Language pair: English to Greek
Text to be translated: Two Informatics books: one
technical guide and one academic textbook.
TM size: 140,000 TUs coming from in-domain texts
Terminology DB size: 30,000 entries
Fuzzy threshold: 70%
Panagiotis Kanavos and Dimitrios Kartsaklis Integrating MT with TM: A Practical Approach 4/ 18
5. Introduction Con鍖guration
Methodology Segment-by-segment work鍖ows
Discussion One-time MT application work鍖ow
Software programs and combinations
MT systems:
Statistical: Moses
Rule-based: Systran
CAT programs:
Sword鍖sh II (Java application) over Linux
D卒j` Vu X over MS Windows
ea
Wordfast, an MS Word macro template
Three combinations, based on practical factors:
Sentence-by-sentence work鍖ow with Sword鍖sh/Moses
Sentence-by-sentence work鍖ow with Wordfast/Systran
One-time MT application work鍖ow with D卒j` Vu X/Moses
ea
Panagiotis Kanavos and Dimitrios Kartsaklis Integrating MT with TM: A Practical Approach 5/ 18
6. Introduction Con鍖guration
Methodology Segment-by-segment work鍖ows
Discussion One-time MT application work鍖ow
Sword鍖sh/Moses combination
Sword鍖sh: Allows connection to external programs or scripts
Connection with Moses achieved with a custom Python script
Basic work鍖ow:
if TM match > 80% then
accept fuzzy match for post-edit
else if 70% < TM match =< 80% then
evaluate the fuzzy match
if quality not acceptable then
apply MT
end if
else
apply MT
if quality not acceptable then
type the translation from scratch
end if
end if
post-edit
Panagiotis Kanavos and Dimitrios Kartsaklis Integrating MT with TM: A Practical Approach 6/ 18
7. Introduction Con鍖guration
Methodology Segment-by-segment work鍖ows
Discussion One-time MT application work鍖ow
Sword鍖sh/Moses combination: Results
Book 1 : Instructive guide, Book 2 : Textbook
Panagiotis Kanavos and Dimitrios Kartsaklis Integrating MT with TM: A Practical Approach 7/ 18
8. Introduction Con鍖guration
Methodology Segment-by-segment work鍖ows
Discussion One-time MT application work鍖ow
Wordfast/Systran combination
Wordfast: A macro template working on top of MS Word
Great deal of customization through MS Word macros
Rule-based version of Systran, supporting user dictionaries
Basic work鍖ow:
if TM match < 70% then
apply pre-editing macros
send segment to MT engine
apply post-editing macros
while MT result not good do
amend Systran user dictionary and re-send segment to MT
end while
else
accept the translation for post-edit
end if
post-edit
Panagiotis Kanavos and Dimitrios Kartsaklis Integrating MT with TM: A Practical Approach 8/ 18
9. Introduction Con鍖guration
Methodology Segment-by-segment work鍖ows
Discussion One-time MT application work鍖ow
Wordfast/Systran combination: Results
Book 1 : Instructive guide, Book 2 : Textbook
Panagiotis Kanavos and Dimitrios Kartsaklis Integrating MT with TM: A Practical Approach 9/ 18
10. Introduction Con鍖guration
Methodology Segment-by-segment work鍖ows
Discussion One-time MT application work鍖ow
D卒j` Vu X/Moses combination
ea
D卒j` Vu X: similar concept to Sword鍖sh
ea
However: No way of integration with an MT system, so the
only option is pre-translation of the whole project with Moses
Send for MT only segments with no TM matches or TM
matches below 80%
Pre-translation stage:
Panagiotis Kanavos and Dimitrios Kartsaklis Integrating MT with TM: A Practical Approach 10/ 18
11. Introduction Con鍖guration
Methodology Segment-by-segment work鍖ows
Discussion One-time MT application work鍖ow
D卒j` Vu X/Moses combination
ea
Basic work鍖ow:
if TM match > 80% then
accept the translation for post-edit
else
evaluate MT translation
if quality not acceptable then
if any TM match exists (between 70-80%) then
accept the translation for post-edit
else
apply auto-assemble feature
if quality not acceptable then
type the translation from scratch
end if
end if
end if
end if
post-edit
Panagiotis Kanavos and Dimitrios Kartsaklis Integrating MT with TM: A Practical Approach 11/ 18
12. Introduction Con鍖guration
Methodology Segment-by-segment work鍖ows
Discussion One-time MT application work鍖ow
D卒j` Vu X/Moses combination: Results
ea
Book 1 : Instructive guide, Book 2 : Textbook
Panagiotis Kanavos and Dimitrios Kartsaklis Integrating MT with TM: A Practical Approach 12/ 18
13. Introduction
Methodology
Discussion
Productivity increase
MT & TM combination: Productivity increased to a level not
possible by applying either technology in isolation:
Panagiotis Kanavos and Dimitrios Kartsaklis Integrating MT with TM: A Practical Approach 13/ 18
14. Introduction
Methodology
Discussion
Important factors
Quantity and quality of TM entries
The domain of the translation material used to train the
statistical MT system
The above impose serious limitations for those who work with
small texts in many di鍖erent domains. Rule-based systems are
more suitable in such cases
Language pair: Coding e鍖cient user dictionaries with
morphologically rich languages is di鍖cult and requires some
trial and error. Phrase-based systems like Moses have better
performance
Style of text: Productivity is higher with repetitive text and
step-by-step instructions
User expertise with all technologies involved
Panagiotis Kanavos and Dimitrios Kartsaklis Integrating MT with TM: A Practical Approach 14/ 18
15. Introduction
Methodology
Discussion
A proposal for a uni鍖ed application
For general acceptance by the professional translation
community, MT should be integrated with TM into an
intuitive uni鍖ed system
Basically a TM environment, with the MT engine as an extra
component working on top of it
MT suggestions should be presented in a controlled and
selective way
Basic components:
A 2-column translation grid for source and target segments
Terminology management
MT engine
Alignment tool
Quality assurance control
Panagiotis Kanavos and Dimitrios Kartsaklis Integrating MT with TM: A Practical Approach 15/ 18
16. Introduction
Methodology
Discussion
Advanced issues
Automation of the training process with TM databases
Statistical systems require considerable computing resources.
A solution: MT as Software As a Service (SaaS)
Terminology databases can be used for more than reference
purposes
Additional entry 鍖elds for coding MT dictionary entries
(Systran)
Linguistic information can be used for creating factored models
(Moses)
Automatic suggestions-as-you-type (TransType, Caitra)
Panagiotis Kanavos and Dimitrios Kartsaklis Integrating MT with TM: A Practical Approach 16/ 18
17. Introduction
Methodology
Discussion
Summary
The combination of MT with TM results in signi鍖cant
productivity increase not feasible in a TM-only environment
Currently there is not a straightforward way for doing that
Work is in progress by the authors towards this purpose, in
the form of a Software Speci鍖cation document that will
describe the design and the components of such a system in
every detail
Panagiotis Kanavos and Dimitrios Kartsaklis Integrating MT with TM: A Practical Approach 17/ 18
18. Introduction
Methodology
Discussion
Thank you!
Any questions?
Panagiotis Kanavos and Dimitrios Kartsaklis Integrating MT with TM: A Practical Approach 18/ 18