ºÝºÝߣ

ºÝºÝߣShare a Scribd company logo
Unidecoder 
Simon Courtois - @happynoff
Transliteration
Ni Hao
§±§²§ª§£§¦§´ 
PRIVIeT
How does 
it work?
At the beginning 
there was ASCII
A 65 
B 66 
C 67 
a 97 
b 98 
c 99
A 65 10 00001 64 32 16 8 4 2 1 
a 97 11 00001 64 32 16 8 4 2 1
B 66 10 00010 64 32 16 8 4 2 1 
b 98 11 00010 64 32 16 8 4 2 1
Then¡­ 
8-bit computers!
So every 
country had its own 
encoding(s)!
All was fine ³Ü²Ô³Ù¾±±ô¡­
The 
World Wide Web
UTF-8 
to the rescue!
Everything on 
32 bits?
Bad idea 
c a f ¨¦
Bad idea 
c a f ¨¦
Bad idea 
0
A better idea 
A 65 010 00001 
110 XXXXX 10 XXXXXX 
1110 XXXX 10 XXXXXX 
10 XXXXXX
A better idea 
110 XXXXX 10 XXXXXX 
110 10000 10 011111
A better idea 
10000011111 1055 §±
So, how does 
unidecoder work?
How do we go 
from §± to P ?
Start from a 
string like ¡°§±¡±
Unpack it 
¡°§±¡±.unpack(¡°U¡±) 
[1055] 
00000100 00011111 
4 31
4 x04 x04.yml 
Ie Io Dj ¡­ P 
0 1 2 31
How to obtain 
4 and 31 ?
unpacked = 1055 
0000010000011111 
unpacked  8 
0000010000011111 
4
How to obtain 
4 and 3 1 ?
31 
unpacked = 1055 
0000010000011111 
unpacked  255 
0000010000011111 
0000000011111111 
0000010000011111
Brain fried yet? 
advertising time!
www.tinci.fr 
Web 
Development 
Software 
Development 
Consulting  
Support 
@tincihq
Resources 
Characters, Symbols and the Unicode Miracle: 
bit.ly/why-utf8 
Unidecoder: github.com/norman/unidecoder 
ºÝºÝߣs: bit.ly/unidecoder
Thank you! 
Simon Courtois - @happynoff

More Related Content

How Unidecoder Transliterates UTF-8 to ASCII