際際滷

際際滷Share a Scribd company logo
intl me this, intl me that
         Andrei Zmievski
         Digg.com




         IPC ~ May 26, 2009 ~ Berlin
Tuesday, May 26, 2009
Who is this guy?
               Open Source Fellow @ Digg
               PHP Core Developer since 1999
               Architect of the Unicode/i18n support
               Release Manager for PHP 6
               Twitter: @a
               Beer lover (and brewer)



Tuesday, May 26, 2009
Why localize?




Tuesday, May 26, 2009
One example.

Tuesday, May 26, 2009
Another reason.

Tuesday, May 26, 2009
Another reason.

Tuesday, May 26, 2009
Why Localize?

               English speakers are now a minority on WWW
               Nearly 3 out of 4 participants surveyed by Common
               Sense Advisory agreed that they were more likely to
               buy from sites in their own languages than in English
               Global consumers will pay more for products with
               information in their language




Tuesday, May 26, 2009
Most important thing...




Tuesday, May 26, 2009
No assumptions!




Tuesday, May 26, 2009
No assumptions!


               English German is just another language
               USA Germany is just another country
               Earth is just another planet (eventually)




Tuesday, May 26, 2009
i18n


               PHP 5.3 or PHP 6
               intl extension
               Consider all data processing and output points




Tuesday, May 26, 2009
Locale data

               Common Locale Data Repository (CLDR)
               374 locales: 137 languages and 140 territories
               Updated regularly
               Used by intl extension




Tuesday, May 26, 2009
Translation

               Identifying what to translate
               Checking all sources
               Obtaining translation
               Iteration




Tuesday, May 26, 2009
What to translate

               Translatable units
               Continue or There were 5 search results
               Approaches
                    Automatic rippers
                    Manual markup




Tuesday, May 26, 2009
Sources: PHP

               Anything destined for output layer
                    single- and double-quoted strings
                    heredocs
                    error/exception messages (if seen by messages)
                    404 pages, anyone?




Tuesday, May 26, 2009
Sources: PHP


               Use output buffering to detect misses
               Consider templates to enforce separation
               Dont use extensions that cannot deal with UTF-8




Tuesday, May 26, 2009
Sources: JS and CSS

               Text
               Images
               Position or alignment of elements may change
               Modularize locale-dependent code into separate 鍖les
                        <script src=/slideshow/intl-me-this-intl-me-that/1499351/quot;/js/common.jsquot; type=quot;text/javascriptquot;></script>
                        <script src=quot;/js/locale-<?php echo $LOCALE ?>.jsquot;
                                type=quot;text/javascriptquot;></script>




Tuesday, May 26, 2009
Sources: DB

               Strings are 鍖ne, if they will never be displayed to users
               Consider using constants/identi鍖ers,
               e.g. not admin or user, but 1 or 2
               For things like product titles, keep separate table with
               translations and link against the main one




Tuesday, May 26, 2009
Sources: external

               File-based content
               RSS Feeds
               Web services
               et al




Tuesday, May 26, 2009
Obtaining translations
                                Fast




                        Cheap          Accurate




Tuesday, May 26, 2009
Obtaining translations

               You
               (maybe) Fast and cheap - not accurate
               quot;Not to perambulate the corridors during the hours of
               repose in the boots of ascension.quot;
                                   sign in an Austrian ski hotel




Tuesday, May 26, 2009
Obtaining translations


               Professionals
               (usually) Accurate and fast - not cheap




Tuesday, May 26, 2009
Obtaining translations


               Community
               (fairly) Accurate and cheap - not fast




Tuesday, May 26, 2009
Facebook approach


               Turn translation into a competitive activity
               Build it into the interface (just another app)
               Validation via voting




Tuesday, May 26, 2009
Iteration

               Catching new units
                    mark up untranslated strings
                    use mnemonic identi鍖ers,
                    e.g. MENU.NAV.HELP
               Merge/update tools




Tuesday, May 26, 2009
Using translations

               Self-contained pages (masochistic)
                    standalone per-locale pages with no common root
                    quick-n-dirty
                    iteration? not so much




Tuesday, May 26, 2009
Using translations


               Runtime
                    uses translation storage and on-the-鍖y lookup
                    usually combined with caching




Tuesday, May 26, 2009
Using translations

               Pre-generation (baking)
                    complete per-locale sites generated of鍖ine
                    no runtime lookups
                    may require runtime operations (sorting, etc)
                    could increase opcode cache memory requirements




Tuesday, May 26, 2009
Considerations

               Fidelity
               Ease of use
               Performance
               Flexibility
               Portability




Tuesday, May 26, 2009
Fidelity


               UTF-8
                    dont use tools that dont support it




Tuesday, May 26, 2009
Fidelity
               How big should translatable units be?
                    As large as possible, but not larger
                    Avoid concatenation problem


                        There are <?php echo $nMesg ?> unread messages
                              in <?php echo $nFolders ?> folders.




Tuesday, May 26, 2009
Fidelity
               How big should translatable units be?
                    As large as possible, but not larger
                    Avoid concatenation problem


                        There are <?php echo $nMesg ?> unread messages
                              in <?php echo $nFolders ?> folders.




Tuesday, May 26, 2009
Fidelity
               How big should translatable units be?
                    As large as possible, but not larger
                    Avoid concatenation problem


                        There are <?php echo $nMesg ?> unread messages
                              in <?php echo $nFolders ?> folders.




Tuesday, May 26, 2009
Fidelity


               Sometimes the largest possible unit is a word
               Context is important
               chinese (person) vs. chinese (language)
               Add context as part of the unit
               chinese-person or CHINESE.PERSON



Tuesday, May 26, 2009
Fidelity
               Combining translations with runtime data
               (parametrization)

                        There are %1 unread messages in %2 folders.



               sprintf() - works for simple things
               gettext() - can help with plurals
               MessageFormat + ChoiceFormat is better



Tuesday, May 26, 2009
Ease of use

               Intuitive tools (or good documentation)
               Transparent formats
               Translation memory
                    useful for short, precise matches, not fuzzy
                    use in testing and 鍖rst pass, not in production




Tuesday, May 26, 2009
Performance

               Caching
                    translation units
                    translated pages
               APC, memcache, etc
               Reduce runtime overhead




Tuesday, May 26, 2009
Flexibility


               Adding new languages/locales quickly
               Translation inheritance




Tuesday, May 26, 2009
Portability

               Moving between tools
               Translations, most importantly
               XLIFF
                    http://en.wikipedia.org/wiki/Xliff




Tuesday, May 26, 2009
Tools: gettext
               Developed for C/C++ originally
               Somewhat obscure format
               Translations on disk
               Have to compile translations with every change
               Proper markup not always possible
               POedit is a decent translation editor



Tuesday, May 26, 2009
Tools: ezTranslation (et al)

               More of a translation look-up tool
               Can support various backends for translation storage
               and caching (QT Linguist format by default)
               Supports parametrized strings
               Bork/l33t 鍖lters for marking untranslated strings




Tuesday, May 26, 2009
Tools: template engines

               Smarty (for example)
               3rd party solutions based on pre- and post-鍖lters
               Translations in con鍖g 鍖les or gettext mainly, could be in
               DB
               Mark-up approaches vary
               Parametrized strings are possible (depends on plugin)



Tuesday, May 26, 2009
Tools: r3

               Developed and supported
               by Yahoo!
               Very 鍖exible and powerful,
               but a bit of a learning curve
               Translations are a subset of
               site variations




Tuesday, May 26, 2009
Tools: r3

               Inheritance everywhere
               Translations in DB
               (MySQL or SQLite)
               Has basic GUI for
               some operations




Tuesday, May 26, 2009
Tools: intl
               Available for PHP 5.3 and PHP 6
               Access to locale data
               Formatters/parsers
                    Number, date, time, message, choice, etc
               Collation (sorting)
               More coming



Tuesday, May 26, 2009
r3: setup
             % sudo sudo pear install -f stickleback-[version].tgz
             % sudo pear install -f --alldeps r3-[version].tgz

             % mkdir ~/r3
             % r3 setup setuphome ~/r3
             % export R3HOME=~/r3

             % r3 setup installdb




Tuesday, May 26, 2009
r3: setup

              %    r3   dim   product create wine
              %    r3   dim   intl create generic_intl
              %    r3   dim   intl create -p generic_intl us
              %    r3   dim   intl create -p generic_intl fr
              %    r3   dim   intl create -p us ca

              % r3 dim intl parent ca set fr -d translation
              % r3 dimension intl parent fr unset -f -d translation
              ...




Tuesday, May 26, 2009
r3: inheritance
               templates          translations
                  generic_intl     generic_intl
                        fr         us
                        us         fr
                             ca         ca




Tuesday, May 26, 2009
r3: make a page

              % r3 target create wine/generic_intl/index.php
              % r3 template edit wine/generic_intl/index.php 
                   index.php.ros
              ...
              % r3 generate -av




Tuesday, May 26, 2009
r3: markup

              <r3:trans>The Wine Source</r3:trans>




              % r3 translation list
              % r3 translation set wine/fr The Wine Source 
                   La Source de Vin
              % r3 generate wine/fr




Tuesday, May 26, 2009
r3: translation
               % r3 translation save all fr.xml
               ...
               % r3 translation merge fr.xml


                <file original='wine/fr/generic'
                         source-language='en'
                         target-language='fr'
                         datatype='plaintext'>
                       <body>
                           <trans-unit id='26'>
                                       approved='yes'>
                               <source>The Wine Source</source>
                               <target>La Source de Vin</target>
                           </trans-unit>
               ...


Tuesday, May 26, 2009
r3: compile-time PHP
           test.html.ros                    test.html
              <div>                         <div>
              <r3:cphp>                     1
              foreach (range(1, 5) as $i)   2
              {                             3
                echo $i,<br/>;           4
              }                             5
              </r3:cphp>                    </div>
              </div>




Tuesday, May 26, 2009
r3: parameterized strings
           test.php.ros
              $message = quot;<r3:trans>You have {0,number} messages as
                          of {1,date,full}.</r3:trans>quot;;
              $args = array(1234, time());
              echo MessageFormatter::formatMessage(
                       $LOCALE, $message, $args
              );


          fr translation
                        Au {1,date,full} vous avez {0,number} messages.


          fr output
                   Au mardi 22 juillet 2008 vous avez 1 234 messages.


Tuesday, May 26, 2009
r3: runtime processing
              $map = array('jp' => 'ja',
                            fr' => 'fr',
                           'us' => 'en_US',
                           'ca' => 'fr_CA',
                           'ru' => 'ru_RU',
                           'de' => 'de_DE',
                           'generic_intl' => 'en_US');
              $ar = array($context->trans('Ivory Coast'),
                          $context->trans('Russia'),
                          $context->trans('USA'));
              $lang = $context->location()->get_lang_attribute();
              $LOCALE = $map[$lang];
              $coll = new Collator($map[$lang]);
              $coll->sort($ar);
              foreach ($ar as $c) {
                  print quot;<li>$c</li>quot;;
              }

Tuesday, May 26, 2009
Resources
               r3
               http://developer.yahoo.com/r3/

               gettext
               http://zez.org/article/articleview/42/
               http://www.poedit.net/

               Smarty
               Chapter 12 of Smarty book
               http://smarty.incutio.com/?page=SmartyMultiLanguageSupport
               http://bit.ly/2q2XM1

               ezTranslation
               http://ezcomponents.org/docs/tutorials/Translation

               intl
               http://php.net/intl


Tuesday, May 26, 2009
thank you                             仗舒亳弍仂
                        http://gravitonic.com/talks




Tuesday, May 26, 2009

More Related Content

intl me this, intl me that

  • 1. intl me this, intl me that Andrei Zmievski Digg.com IPC ~ May 26, 2009 ~ Berlin Tuesday, May 26, 2009
  • 2. Who is this guy? Open Source Fellow @ Digg PHP Core Developer since 1999 Architect of the Unicode/i18n support Release Manager for PHP 6 Twitter: @a Beer lover (and brewer) Tuesday, May 26, 2009
  • 7. Why Localize? English speakers are now a minority on WWW Nearly 3 out of 4 participants surveyed by Common Sense Advisory agreed that they were more likely to buy from sites in their own languages than in English Global consumers will pay more for products with information in their language Tuesday, May 26, 2009
  • 10. No assumptions! English German is just another language USA Germany is just another country Earth is just another planet (eventually) Tuesday, May 26, 2009
  • 11. i18n PHP 5.3 or PHP 6 intl extension Consider all data processing and output points Tuesday, May 26, 2009
  • 12. Locale data Common Locale Data Repository (CLDR) 374 locales: 137 languages and 140 territories Updated regularly Used by intl extension Tuesday, May 26, 2009
  • 13. Translation Identifying what to translate Checking all sources Obtaining translation Iteration Tuesday, May 26, 2009
  • 14. What to translate Translatable units Continue or There were 5 search results Approaches Automatic rippers Manual markup Tuesday, May 26, 2009
  • 15. Sources: PHP Anything destined for output layer single- and double-quoted strings heredocs error/exception messages (if seen by messages) 404 pages, anyone? Tuesday, May 26, 2009
  • 16. Sources: PHP Use output buffering to detect misses Consider templates to enforce separation Dont use extensions that cannot deal with UTF-8 Tuesday, May 26, 2009
  • 17. Sources: JS and CSS Text Images Position or alignment of elements may change Modularize locale-dependent code into separate 鍖les <script src=/slideshow/intl-me-this-intl-me-that/1499351/quot;/js/common.jsquot; type=quot;text/javascriptquot;></script> <script src=quot;/js/locale-<?php echo $LOCALE ?>.jsquot; type=quot;text/javascriptquot;></script> Tuesday, May 26, 2009
  • 18. Sources: DB Strings are 鍖ne, if they will never be displayed to users Consider using constants/identi鍖ers, e.g. not admin or user, but 1 or 2 For things like product titles, keep separate table with translations and link against the main one Tuesday, May 26, 2009
  • 19. Sources: external File-based content RSS Feeds Web services et al Tuesday, May 26, 2009
  • 20. Obtaining translations Fast Cheap Accurate Tuesday, May 26, 2009
  • 21. Obtaining translations You (maybe) Fast and cheap - not accurate quot;Not to perambulate the corridors during the hours of repose in the boots of ascension.quot; sign in an Austrian ski hotel Tuesday, May 26, 2009
  • 22. Obtaining translations Professionals (usually) Accurate and fast - not cheap Tuesday, May 26, 2009
  • 23. Obtaining translations Community (fairly) Accurate and cheap - not fast Tuesday, May 26, 2009
  • 24. Facebook approach Turn translation into a competitive activity Build it into the interface (just another app) Validation via voting Tuesday, May 26, 2009
  • 25. Iteration Catching new units mark up untranslated strings use mnemonic identi鍖ers, e.g. MENU.NAV.HELP Merge/update tools Tuesday, May 26, 2009
  • 26. Using translations Self-contained pages (masochistic) standalone per-locale pages with no common root quick-n-dirty iteration? not so much Tuesday, May 26, 2009
  • 27. Using translations Runtime uses translation storage and on-the-鍖y lookup usually combined with caching Tuesday, May 26, 2009
  • 28. Using translations Pre-generation (baking) complete per-locale sites generated of鍖ine no runtime lookups may require runtime operations (sorting, etc) could increase opcode cache memory requirements Tuesday, May 26, 2009
  • 29. Considerations Fidelity Ease of use Performance Flexibility Portability Tuesday, May 26, 2009
  • 30. Fidelity UTF-8 dont use tools that dont support it Tuesday, May 26, 2009
  • 31. Fidelity How big should translatable units be? As large as possible, but not larger Avoid concatenation problem There are <?php echo $nMesg ?> unread messages in <?php echo $nFolders ?> folders. Tuesday, May 26, 2009
  • 32. Fidelity How big should translatable units be? As large as possible, but not larger Avoid concatenation problem There are <?php echo $nMesg ?> unread messages in <?php echo $nFolders ?> folders. Tuesday, May 26, 2009
  • 33. Fidelity How big should translatable units be? As large as possible, but not larger Avoid concatenation problem There are <?php echo $nMesg ?> unread messages in <?php echo $nFolders ?> folders. Tuesday, May 26, 2009
  • 34. Fidelity Sometimes the largest possible unit is a word Context is important chinese (person) vs. chinese (language) Add context as part of the unit chinese-person or CHINESE.PERSON Tuesday, May 26, 2009
  • 35. Fidelity Combining translations with runtime data (parametrization) There are %1 unread messages in %2 folders. sprintf() - works for simple things gettext() - can help with plurals MessageFormat + ChoiceFormat is better Tuesday, May 26, 2009
  • 36. Ease of use Intuitive tools (or good documentation) Transparent formats Translation memory useful for short, precise matches, not fuzzy use in testing and 鍖rst pass, not in production Tuesday, May 26, 2009
  • 37. Performance Caching translation units translated pages APC, memcache, etc Reduce runtime overhead Tuesday, May 26, 2009
  • 38. Flexibility Adding new languages/locales quickly Translation inheritance Tuesday, May 26, 2009
  • 39. Portability Moving between tools Translations, most importantly XLIFF http://en.wikipedia.org/wiki/Xliff Tuesday, May 26, 2009
  • 40. Tools: gettext Developed for C/C++ originally Somewhat obscure format Translations on disk Have to compile translations with every change Proper markup not always possible POedit is a decent translation editor Tuesday, May 26, 2009
  • 41. Tools: ezTranslation (et al) More of a translation look-up tool Can support various backends for translation storage and caching (QT Linguist format by default) Supports parametrized strings Bork/l33t 鍖lters for marking untranslated strings Tuesday, May 26, 2009
  • 42. Tools: template engines Smarty (for example) 3rd party solutions based on pre- and post-鍖lters Translations in con鍖g 鍖les or gettext mainly, could be in DB Mark-up approaches vary Parametrized strings are possible (depends on plugin) Tuesday, May 26, 2009
  • 43. Tools: r3 Developed and supported by Yahoo! Very 鍖exible and powerful, but a bit of a learning curve Translations are a subset of site variations Tuesday, May 26, 2009
  • 44. Tools: r3 Inheritance everywhere Translations in DB (MySQL or SQLite) Has basic GUI for some operations Tuesday, May 26, 2009
  • 45. Tools: intl Available for PHP 5.3 and PHP 6 Access to locale data Formatters/parsers Number, date, time, message, choice, etc Collation (sorting) More coming Tuesday, May 26, 2009
  • 46. r3: setup % sudo sudo pear install -f stickleback-[version].tgz % sudo pear install -f --alldeps r3-[version].tgz % mkdir ~/r3 % r3 setup setuphome ~/r3 % export R3HOME=~/r3 % r3 setup installdb Tuesday, May 26, 2009
  • 47. r3: setup % r3 dim product create wine % r3 dim intl create generic_intl % r3 dim intl create -p generic_intl us % r3 dim intl create -p generic_intl fr % r3 dim intl create -p us ca % r3 dim intl parent ca set fr -d translation % r3 dimension intl parent fr unset -f -d translation ... Tuesday, May 26, 2009
  • 48. r3: inheritance templates translations generic_intl generic_intl fr us us fr ca ca Tuesday, May 26, 2009
  • 49. r3: make a page % r3 target create wine/generic_intl/index.php % r3 template edit wine/generic_intl/index.php index.php.ros ... % r3 generate -av Tuesday, May 26, 2009
  • 50. r3: markup <r3:trans>The Wine Source</r3:trans> % r3 translation list % r3 translation set wine/fr The Wine Source La Source de Vin % r3 generate wine/fr Tuesday, May 26, 2009
  • 51. r3: translation % r3 translation save all fr.xml ... % r3 translation merge fr.xml <file original='wine/fr/generic' source-language='en' target-language='fr' datatype='plaintext'> <body> <trans-unit id='26'> approved='yes'> <source>The Wine Source</source> <target>La Source de Vin</target> </trans-unit> ... Tuesday, May 26, 2009
  • 52. r3: compile-time PHP test.html.ros test.html <div> <div> <r3:cphp> 1 foreach (range(1, 5) as $i) 2 { 3 echo $i,<br/>; 4 } 5 </r3:cphp> </div> </div> Tuesday, May 26, 2009
  • 53. r3: parameterized strings test.php.ros $message = quot;<r3:trans>You have {0,number} messages as of {1,date,full}.</r3:trans>quot;; $args = array(1234, time()); echo MessageFormatter::formatMessage( $LOCALE, $message, $args ); fr translation Au {1,date,full} vous avez {0,number} messages. fr output Au mardi 22 juillet 2008 vous avez 1 234 messages. Tuesday, May 26, 2009
  • 54. r3: runtime processing $map = array('jp' => 'ja', fr' => 'fr', 'us' => 'en_US', 'ca' => 'fr_CA', 'ru' => 'ru_RU', 'de' => 'de_DE', 'generic_intl' => 'en_US'); $ar = array($context->trans('Ivory Coast'), $context->trans('Russia'), $context->trans('USA')); $lang = $context->location()->get_lang_attribute(); $LOCALE = $map[$lang]; $coll = new Collator($map[$lang]); $coll->sort($ar); foreach ($ar as $c) { print quot;<li>$c</li>quot;; } Tuesday, May 26, 2009
  • 55. Resources r3 http://developer.yahoo.com/r3/ gettext http://zez.org/article/articleview/42/ http://www.poedit.net/ Smarty Chapter 12 of Smarty book http://smarty.incutio.com/?page=SmartyMultiLanguageSupport http://bit.ly/2q2XM1 ezTranslation http://ezcomponents.org/docs/tutorials/Translation intl http://php.net/intl Tuesday, May 26, 2009
  • 56. thank you 仗舒亳弍仂 http://gravitonic.com/talks Tuesday, May 26, 2009