際際滷

際際滷Share a Scribd company logo
Introduction
                                Methodology
                                  Discussion




Integrating Machine Translation with Translation
         Memory: A Practical Approach

            Panagiotis Kanavos and Dimitrios Kartsaklis


                                 November 4, 2010




  Panagiotis Kanavos and Dimitrios Kartsaklis   Integrating MT with TM: A Practical Approach   1/ 18
Introduction
                                     Methodology
                                       Discussion


Introduction


      Despite the ongoing research and the progress on the 鍖eld,
      Machine Translation has not been widely accepted by the
      professional translation industry
      Common criticisms:
              MT is only suitable for draft translations of e-mails and web
              pages
              MT is not e鍖cient for morphologically rich languages
              MT is useful only to large companies owning a wealth of
              resources
      In a nutshell: MT is something for researchers to play around
      with



       Panagiotis Kanavos and Dimitrios Kartsaklis   Integrating MT with TM: A Practical Approach   2/ 18
Introduction
                                    Methodology
                                      Discussion


A Case Study


      How MT can be incorporated into professional translation
      work鍖ows, with limited resources, in ways that signi鍖cantly
      increase productivity.
      We combine both statistical and rule-based MT systems with
      Translation Memory software using two approaches:
             The on demand, sentence-by-sentence application of MT
             The one-time application of MT into the whole translation
             project
      The case study is conducted in production conditions, with
      鍖nal deliverables that require the highest translation quality.



      Panagiotis Kanavos and Dimitrios Kartsaklis   Integrating MT with TM: A Practical Approach   3/ 18
Introduction    Con鍖guration
                                     Methodology     Segment-by-segment work鍖ows
                                       Discussion    One-time MT application work鍖ow


Our setting



      Language pair: English to Greek
      Text to be translated: Two Informatics books: one
      technical guide and one academic textbook.
      TM size: 140,000 TUs coming from in-domain texts
      Terminology DB size: 30,000 entries
      Fuzzy threshold: 70%




       Panagiotis Kanavos and Dimitrios Kartsaklis   Integrating MT with TM: A Practical Approach   4/ 18
Introduction    Con鍖guration
                                    Methodology     Segment-by-segment work鍖ows
                                      Discussion    One-time MT application work鍖ow


Software programs and combinations


      MT systems:
             Statistical: Moses
             Rule-based: Systran
      CAT programs:
             Sword鍖sh II (Java application) over Linux
             D卒j` Vu X over MS Windows
              ea
             Wordfast, an MS Word macro template
      Three combinations, based on practical factors:
             Sentence-by-sentence work鍖ow with Sword鍖sh/Moses
             Sentence-by-sentence work鍖ow with Wordfast/Systran
             One-time MT application work鍖ow with D卒j` Vu X/Moses
                                                    ea



      Panagiotis Kanavos and Dimitrios Kartsaklis   Integrating MT with TM: A Practical Approach   5/ 18
Introduction    Con鍖guration
                                    Methodology     Segment-by-segment work鍖ows
                                      Discussion    One-time MT application work鍖ow


Sword鍖sh/Moses combination
      Sword鍖sh: Allows connection to external programs or scripts
      Connection with Moses achieved with a custom Python script
      Basic work鍖ow:
        if TM match > 80% then
           accept fuzzy match for post-edit
        else if 70% < TM match =< 80% then
           evaluate the fuzzy match
           if quality not acceptable then
              apply MT
           end if
        else
           apply MT
           if quality not acceptable then
              type the translation from scratch
           end if
        end if
        post-edit
      Panagiotis Kanavos and Dimitrios Kartsaklis   Integrating MT with TM: A Practical Approach   6/ 18
Introduction    Con鍖guration
                                    Methodology     Segment-by-segment work鍖ows
                                      Discussion    One-time MT application work鍖ow


Sword鍖sh/Moses combination: Results




                       Book 1 : Instructive guide, Book 2 : Textbook

      Panagiotis Kanavos and Dimitrios Kartsaklis   Integrating MT with TM: A Practical Approach   7/ 18
Introduction    Con鍖guration
                                    Methodology     Segment-by-segment work鍖ows
                                      Discussion    One-time MT application work鍖ow


Wordfast/Systran combination

      Wordfast: A macro template working on top of MS Word
      Great deal of customization through MS Word macros
      Rule-based version of Systran, supporting user dictionaries
      Basic work鍖ow:
        if TM match < 70% then
           apply pre-editing macros
           send segment to MT engine
           apply post-editing macros
           while MT result not good do
              amend Systran user dictionary and re-send segment to MT
           end while
        else
           accept the translation for post-edit
        end if
        post-edit

      Panagiotis Kanavos and Dimitrios Kartsaklis   Integrating MT with TM: A Practical Approach   8/ 18
Introduction    Con鍖guration
                                    Methodology     Segment-by-segment work鍖ows
                                      Discussion    One-time MT application work鍖ow


Wordfast/Systran combination: Results




                       Book 1 : Instructive guide, Book 2 : Textbook

      Panagiotis Kanavos and Dimitrios Kartsaklis   Integrating MT with TM: A Practical Approach   9/ 18
Introduction    Con鍖guration
                                    Methodology     Segment-by-segment work鍖ows
                                      Discussion    One-time MT application work鍖ow


D卒j` Vu X/Moses combination
 ea
      D卒j` Vu X: similar concept to Sword鍖sh
       ea
      However: No way of integration with an MT system, so the
      only option is pre-translation of the whole project with Moses
      Send for MT only segments with no TM matches or TM
      matches below 80%
      Pre-translation stage:




      Panagiotis Kanavos and Dimitrios Kartsaklis   Integrating MT with TM: A Practical Approach   10/ 18
Introduction    Con鍖guration
                                    Methodology     Segment-by-segment work鍖ows
                                      Discussion    One-time MT application work鍖ow


D卒j` Vu X/Moses combination
 ea
      Basic work鍖ow:
        if TM match > 80% then
           accept the translation for post-edit
        else
           evaluate MT translation
           if quality not acceptable then
              if any TM match exists (between 70-80%) then
                 accept the translation for post-edit
              else
                 apply auto-assemble feature
                 if quality not acceptable then
                     type the translation from scratch
                 end if
              end if
           end if
        end if
        post-edit
      Panagiotis Kanavos and Dimitrios Kartsaklis   Integrating MT with TM: A Practical Approach   11/ 18
Introduction    Con鍖guration
                                    Methodology     Segment-by-segment work鍖ows
                                      Discussion    One-time MT application work鍖ow


D卒j` Vu X/Moses combination: Results
 ea




                       Book 1 : Instructive guide, Book 2 : Textbook

      Panagiotis Kanavos and Dimitrios Kartsaklis   Integrating MT with TM: A Practical Approach   12/ 18
Introduction
                                     Methodology
                                       Discussion


Productivity increase
       MT & TM combination: Productivity increased to a level not
       possible by applying either technology in isolation:




       Panagiotis Kanavos and Dimitrios Kartsaklis   Integrating MT with TM: A Practical Approach   13/ 18
Introduction
                                     Methodology
                                       Discussion


Important factors

      Quantity and quality of TM entries
      The domain of the translation material used to train the
      statistical MT system
              The above impose serious limitations for those who work with
              small texts in many di鍖erent domains. Rule-based systems are
              more suitable in such cases
      Language pair: Coding e鍖cient user dictionaries with
      morphologically rich languages is di鍖cult and requires some
      trial and error. Phrase-based systems like Moses have better
      performance
      Style of text: Productivity is higher with repetitive text and
      step-by-step instructions
      User expertise with all technologies involved

       Panagiotis Kanavos and Dimitrios Kartsaklis   Integrating MT with TM: A Practical Approach   14/ 18
Introduction
                                     Methodology
                                       Discussion


A proposal for a uni鍖ed application

       For general acceptance by the professional translation
       community, MT should be integrated with TM into an
       intuitive uni鍖ed system
       Basically a TM environment, with the MT engine as an extra
       component working on top of it
       MT suggestions should be presented in a controlled and
       selective way
       Basic components:
              A 2-column translation grid for source and target segments
              Terminology management
              MT engine
              Alignment tool
              Quality assurance control

       Panagiotis Kanavos and Dimitrios Kartsaklis   Integrating MT with TM: A Practical Approach   15/ 18
Introduction
                                    Methodology
                                      Discussion


Advanced issues


      Automation of the training process with TM databases
      Statistical systems require considerable computing resources.
      A solution: MT as Software As a Service (SaaS)
      Terminology databases can be used for more than reference
      purposes
             Additional entry 鍖elds for coding MT dictionary entries
             (Systran)
             Linguistic information can be used for creating factored models
             (Moses)
      Automatic suggestions-as-you-type (TransType, Caitra)



      Panagiotis Kanavos and Dimitrios Kartsaklis   Integrating MT with TM: A Practical Approach   16/ 18
Introduction
                                   Methodology
                                     Discussion


Summary



     The combination of MT with TM results in signi鍖cant
     productivity increase not feasible in a TM-only environment
     Currently there is not a straightforward way for doing that
     Work is in progress by the authors towards this purpose, in
     the form of a Software Speci鍖cation document that will
     describe the design and the components of such a system in
     every detail




     Panagiotis Kanavos and Dimitrios Kartsaklis   Integrating MT with TM: A Practical Approach   17/ 18
Introduction
                              Methodology
                                Discussion




                            Thank you!

                        Any questions?




Panagiotis Kanavos and Dimitrios Kartsaklis   Integrating MT with TM: A Practical Approach   18/ 18

More Related Content

Integrating Machine Translation with Translation Memory: A Practical Approach

  • 1. Introduction Methodology Discussion Integrating Machine Translation with Translation Memory: A Practical Approach Panagiotis Kanavos and Dimitrios Kartsaklis November 4, 2010 Panagiotis Kanavos and Dimitrios Kartsaklis Integrating MT with TM: A Practical Approach 1/ 18
  • 2. Introduction Methodology Discussion Introduction Despite the ongoing research and the progress on the 鍖eld, Machine Translation has not been widely accepted by the professional translation industry Common criticisms: MT is only suitable for draft translations of e-mails and web pages MT is not e鍖cient for morphologically rich languages MT is useful only to large companies owning a wealth of resources In a nutshell: MT is something for researchers to play around with Panagiotis Kanavos and Dimitrios Kartsaklis Integrating MT with TM: A Practical Approach 2/ 18
  • 3. Introduction Methodology Discussion A Case Study How MT can be incorporated into professional translation work鍖ows, with limited resources, in ways that signi鍖cantly increase productivity. We combine both statistical and rule-based MT systems with Translation Memory software using two approaches: The on demand, sentence-by-sentence application of MT The one-time application of MT into the whole translation project The case study is conducted in production conditions, with 鍖nal deliverables that require the highest translation quality. Panagiotis Kanavos and Dimitrios Kartsaklis Integrating MT with TM: A Practical Approach 3/ 18
  • 4. Introduction Con鍖guration Methodology Segment-by-segment work鍖ows Discussion One-time MT application work鍖ow Our setting Language pair: English to Greek Text to be translated: Two Informatics books: one technical guide and one academic textbook. TM size: 140,000 TUs coming from in-domain texts Terminology DB size: 30,000 entries Fuzzy threshold: 70% Panagiotis Kanavos and Dimitrios Kartsaklis Integrating MT with TM: A Practical Approach 4/ 18
  • 5. Introduction Con鍖guration Methodology Segment-by-segment work鍖ows Discussion One-time MT application work鍖ow Software programs and combinations MT systems: Statistical: Moses Rule-based: Systran CAT programs: Sword鍖sh II (Java application) over Linux D卒j` Vu X over MS Windows ea Wordfast, an MS Word macro template Three combinations, based on practical factors: Sentence-by-sentence work鍖ow with Sword鍖sh/Moses Sentence-by-sentence work鍖ow with Wordfast/Systran One-time MT application work鍖ow with D卒j` Vu X/Moses ea Panagiotis Kanavos and Dimitrios Kartsaklis Integrating MT with TM: A Practical Approach 5/ 18
  • 6. Introduction Con鍖guration Methodology Segment-by-segment work鍖ows Discussion One-time MT application work鍖ow Sword鍖sh/Moses combination Sword鍖sh: Allows connection to external programs or scripts Connection with Moses achieved with a custom Python script Basic work鍖ow: if TM match > 80% then accept fuzzy match for post-edit else if 70% < TM match =< 80% then evaluate the fuzzy match if quality not acceptable then apply MT end if else apply MT if quality not acceptable then type the translation from scratch end if end if post-edit Panagiotis Kanavos and Dimitrios Kartsaklis Integrating MT with TM: A Practical Approach 6/ 18
  • 7. Introduction Con鍖guration Methodology Segment-by-segment work鍖ows Discussion One-time MT application work鍖ow Sword鍖sh/Moses combination: Results Book 1 : Instructive guide, Book 2 : Textbook Panagiotis Kanavos and Dimitrios Kartsaklis Integrating MT with TM: A Practical Approach 7/ 18
  • 8. Introduction Con鍖guration Methodology Segment-by-segment work鍖ows Discussion One-time MT application work鍖ow Wordfast/Systran combination Wordfast: A macro template working on top of MS Word Great deal of customization through MS Word macros Rule-based version of Systran, supporting user dictionaries Basic work鍖ow: if TM match < 70% then apply pre-editing macros send segment to MT engine apply post-editing macros while MT result not good do amend Systran user dictionary and re-send segment to MT end while else accept the translation for post-edit end if post-edit Panagiotis Kanavos and Dimitrios Kartsaklis Integrating MT with TM: A Practical Approach 8/ 18
  • 9. Introduction Con鍖guration Methodology Segment-by-segment work鍖ows Discussion One-time MT application work鍖ow Wordfast/Systran combination: Results Book 1 : Instructive guide, Book 2 : Textbook Panagiotis Kanavos and Dimitrios Kartsaklis Integrating MT with TM: A Practical Approach 9/ 18
  • 10. Introduction Con鍖guration Methodology Segment-by-segment work鍖ows Discussion One-time MT application work鍖ow D卒j` Vu X/Moses combination ea D卒j` Vu X: similar concept to Sword鍖sh ea However: No way of integration with an MT system, so the only option is pre-translation of the whole project with Moses Send for MT only segments with no TM matches or TM matches below 80% Pre-translation stage: Panagiotis Kanavos and Dimitrios Kartsaklis Integrating MT with TM: A Practical Approach 10/ 18
  • 11. Introduction Con鍖guration Methodology Segment-by-segment work鍖ows Discussion One-time MT application work鍖ow D卒j` Vu X/Moses combination ea Basic work鍖ow: if TM match > 80% then accept the translation for post-edit else evaluate MT translation if quality not acceptable then if any TM match exists (between 70-80%) then accept the translation for post-edit else apply auto-assemble feature if quality not acceptable then type the translation from scratch end if end if end if end if post-edit Panagiotis Kanavos and Dimitrios Kartsaklis Integrating MT with TM: A Practical Approach 11/ 18
  • 12. Introduction Con鍖guration Methodology Segment-by-segment work鍖ows Discussion One-time MT application work鍖ow D卒j` Vu X/Moses combination: Results ea Book 1 : Instructive guide, Book 2 : Textbook Panagiotis Kanavos and Dimitrios Kartsaklis Integrating MT with TM: A Practical Approach 12/ 18
  • 13. Introduction Methodology Discussion Productivity increase MT & TM combination: Productivity increased to a level not possible by applying either technology in isolation: Panagiotis Kanavos and Dimitrios Kartsaklis Integrating MT with TM: A Practical Approach 13/ 18
  • 14. Introduction Methodology Discussion Important factors Quantity and quality of TM entries The domain of the translation material used to train the statistical MT system The above impose serious limitations for those who work with small texts in many di鍖erent domains. Rule-based systems are more suitable in such cases Language pair: Coding e鍖cient user dictionaries with morphologically rich languages is di鍖cult and requires some trial and error. Phrase-based systems like Moses have better performance Style of text: Productivity is higher with repetitive text and step-by-step instructions User expertise with all technologies involved Panagiotis Kanavos and Dimitrios Kartsaklis Integrating MT with TM: A Practical Approach 14/ 18
  • 15. Introduction Methodology Discussion A proposal for a uni鍖ed application For general acceptance by the professional translation community, MT should be integrated with TM into an intuitive uni鍖ed system Basically a TM environment, with the MT engine as an extra component working on top of it MT suggestions should be presented in a controlled and selective way Basic components: A 2-column translation grid for source and target segments Terminology management MT engine Alignment tool Quality assurance control Panagiotis Kanavos and Dimitrios Kartsaklis Integrating MT with TM: A Practical Approach 15/ 18
  • 16. Introduction Methodology Discussion Advanced issues Automation of the training process with TM databases Statistical systems require considerable computing resources. A solution: MT as Software As a Service (SaaS) Terminology databases can be used for more than reference purposes Additional entry 鍖elds for coding MT dictionary entries (Systran) Linguistic information can be used for creating factored models (Moses) Automatic suggestions-as-you-type (TransType, Caitra) Panagiotis Kanavos and Dimitrios Kartsaklis Integrating MT with TM: A Practical Approach 16/ 18
  • 17. Introduction Methodology Discussion Summary The combination of MT with TM results in signi鍖cant productivity increase not feasible in a TM-only environment Currently there is not a straightforward way for doing that Work is in progress by the authors towards this purpose, in the form of a Software Speci鍖cation document that will describe the design and the components of such a system in every detail Panagiotis Kanavos and Dimitrios Kartsaklis Integrating MT with TM: A Practical Approach 17/ 18
  • 18. Introduction Methodology Discussion Thank you! Any questions? Panagiotis Kanavos and Dimitrios Kartsaklis Integrating MT with TM: A Practical Approach 18/ 18