What are the problems with and best solutions to translating your web site or application into other languages? This presentation covers several approaches to this problem-based on PHP, focusing on utilizing the new intl extension as well as other open source tools.
1 of 56
More Related Content
intl me this, intl me that
1. intl me this, intl me that
Andrei Zmievski
Digg.com
IPC ~ May 26, 2009 ~ Berlin
Tuesday, May 26, 2009
2. Who is this guy?
Open Source Fellow @ Digg
PHP Core Developer since 1999
Architect of the Unicode/i18n support
Release Manager for PHP 6
Twitter: @a
Beer lover (and brewer)
Tuesday, May 26, 2009
7. Why Localize?
English speakers are now a minority on WWW
Nearly 3 out of 4 participants surveyed by Common
Sense Advisory agreed that they were more likely to
buy from sites in their own languages than in English
Global consumers will pay more for products with
information in their language
Tuesday, May 26, 2009
10. No assumptions!
English German is just another language
USA Germany is just another country
Earth is just another planet (eventually)
Tuesday, May 26, 2009
11. i18n
PHP 5.3 or PHP 6
intl extension
Consider all data processing and output points
Tuesday, May 26, 2009
12. Locale data
Common Locale Data Repository (CLDR)
374 locales: 137 languages and 140 territories
Updated regularly
Used by intl extension
Tuesday, May 26, 2009
13. Translation
Identifying what to translate
Checking all sources
Obtaining translation
Iteration
Tuesday, May 26, 2009
14. What to translate
Translatable units
Continue or There were 5 search results
Approaches
Automatic rippers
Manual markup
Tuesday, May 26, 2009
15. Sources: PHP
Anything destined for output layer
single- and double-quoted strings
heredocs
error/exception messages (if seen by messages)
404 pages, anyone?
Tuesday, May 26, 2009
16. Sources: PHP
Use output buffering to detect misses
Consider templates to enforce separation
Dont use extensions that cannot deal with UTF-8
Tuesday, May 26, 2009
17. Sources: JS and CSS
Text
Images
Position or alignment of elements may change
Modularize locale-dependent code into separate 鍖les
<script src=/slideshow/intl-me-this-intl-me-that/1499351/quot;/js/common.jsquot; type=quot;text/javascriptquot;></script>
<script src=quot;/js/locale-<?php echo $LOCALE ?>.jsquot;
type=quot;text/javascriptquot;></script>
Tuesday, May 26, 2009
18. Sources: DB
Strings are 鍖ne, if they will never be displayed to users
Consider using constants/identi鍖ers,
e.g. not admin or user, but 1 or 2
For things like product titles, keep separate table with
translations and link against the main one
Tuesday, May 26, 2009
19. Sources: external
File-based content
RSS Feeds
Web services
et al
Tuesday, May 26, 2009
21. Obtaining translations
You
(maybe) Fast and cheap - not accurate
quot;Not to perambulate the corridors during the hours of
repose in the boots of ascension.quot;
sign in an Austrian ski hotel
Tuesday, May 26, 2009
22. Obtaining translations
Professionals
(usually) Accurate and fast - not cheap
Tuesday, May 26, 2009
23. Obtaining translations
Community
(fairly) Accurate and cheap - not fast
Tuesday, May 26, 2009
24. Facebook approach
Turn translation into a competitive activity
Build it into the interface (just another app)
Validation via voting
Tuesday, May 26, 2009
25. Iteration
Catching new units
mark up untranslated strings
use mnemonic identi鍖ers,
e.g. MENU.NAV.HELP
Merge/update tools
Tuesday, May 26, 2009
26. Using translations
Self-contained pages (masochistic)
standalone per-locale pages with no common root
quick-n-dirty
iteration? not so much
Tuesday, May 26, 2009
27. Using translations
Runtime
uses translation storage and on-the-鍖y lookup
usually combined with caching
Tuesday, May 26, 2009
28. Using translations
Pre-generation (baking)
complete per-locale sites generated of鍖ine
no runtime lookups
may require runtime operations (sorting, etc)
could increase opcode cache memory requirements
Tuesday, May 26, 2009
29. Considerations
Fidelity
Ease of use
Performance
Flexibility
Portability
Tuesday, May 26, 2009
30. Fidelity
UTF-8
dont use tools that dont support it
Tuesday, May 26, 2009
31. Fidelity
How big should translatable units be?
As large as possible, but not larger
Avoid concatenation problem
There are <?php echo $nMesg ?> unread messages
in <?php echo $nFolders ?> folders.
Tuesday, May 26, 2009
32. Fidelity
How big should translatable units be?
As large as possible, but not larger
Avoid concatenation problem
There are <?php echo $nMesg ?> unread messages
in <?php echo $nFolders ?> folders.
Tuesday, May 26, 2009
33. Fidelity
How big should translatable units be?
As large as possible, but not larger
Avoid concatenation problem
There are <?php echo $nMesg ?> unread messages
in <?php echo $nFolders ?> folders.
Tuesday, May 26, 2009
34. Fidelity
Sometimes the largest possible unit is a word
Context is important
chinese (person) vs. chinese (language)
Add context as part of the unit
chinese-person or CHINESE.PERSON
Tuesday, May 26, 2009
35. Fidelity
Combining translations with runtime data
(parametrization)
There are %1 unread messages in %2 folders.
sprintf() - works for simple things
gettext() - can help with plurals
MessageFormat + ChoiceFormat is better
Tuesday, May 26, 2009
36. Ease of use
Intuitive tools (or good documentation)
Transparent formats
Translation memory
useful for short, precise matches, not fuzzy
use in testing and 鍖rst pass, not in production
Tuesday, May 26, 2009
37. Performance
Caching
translation units
translated pages
APC, memcache, etc
Reduce runtime overhead
Tuesday, May 26, 2009
38. Flexibility
Adding new languages/locales quickly
Translation inheritance
Tuesday, May 26, 2009
39. Portability
Moving between tools
Translations, most importantly
XLIFF
http://en.wikipedia.org/wiki/Xliff
Tuesday, May 26, 2009
40. Tools: gettext
Developed for C/C++ originally
Somewhat obscure format
Translations on disk
Have to compile translations with every change
Proper markup not always possible
POedit is a decent translation editor
Tuesday, May 26, 2009
41. Tools: ezTranslation (et al)
More of a translation look-up tool
Can support various backends for translation storage
and caching (QT Linguist format by default)
Supports parametrized strings
Bork/l33t 鍖lters for marking untranslated strings
Tuesday, May 26, 2009
42. Tools: template engines
Smarty (for example)
3rd party solutions based on pre- and post-鍖lters
Translations in con鍖g 鍖les or gettext mainly, could be in
DB
Mark-up approaches vary
Parametrized strings are possible (depends on plugin)
Tuesday, May 26, 2009
43. Tools: r3
Developed and supported
by Yahoo!
Very 鍖exible and powerful,
but a bit of a learning curve
Translations are a subset of
site variations
Tuesday, May 26, 2009
44. Tools: r3
Inheritance everywhere
Translations in DB
(MySQL or SQLite)
Has basic GUI for
some operations
Tuesday, May 26, 2009
45. Tools: intl
Available for PHP 5.3 and PHP 6
Access to locale data
Formatters/parsers
Number, date, time, message, choice, etc
Collation (sorting)
More coming
Tuesday, May 26, 2009
47. r3: setup
% r3 dim product create wine
% r3 dim intl create generic_intl
% r3 dim intl create -p generic_intl us
% r3 dim intl create -p generic_intl fr
% r3 dim intl create -p us ca
% r3 dim intl parent ca set fr -d translation
% r3 dimension intl parent fr unset -f -d translation
...
Tuesday, May 26, 2009
48. r3: inheritance
templates translations
generic_intl generic_intl
fr us
us fr
ca ca
Tuesday, May 26, 2009
49. r3: make a page
% r3 target create wine/generic_intl/index.php
% r3 template edit wine/generic_intl/index.php
index.php.ros
...
% r3 generate -av
Tuesday, May 26, 2009
50. r3: markup
<r3:trans>The Wine Source</r3:trans>
% r3 translation list
% r3 translation set wine/fr The Wine Source
La Source de Vin
% r3 generate wine/fr
Tuesday, May 26, 2009
51. r3: translation
% r3 translation save all fr.xml
...
% r3 translation merge fr.xml
<file original='wine/fr/generic'
source-language='en'
target-language='fr'
datatype='plaintext'>
<body>
<trans-unit id='26'>
approved='yes'>
<source>The Wine Source</source>
<target>La Source de Vin</target>
</trans-unit>
...
Tuesday, May 26, 2009