際際滷

際際滷Share a Scribd company logo
How Masterly Are People at
Playing with Their Vocabulary?
Sanita Reinsone, Mat┤ss Rikters
Word games
? Lingo (~1995)
? Wordle (2021)
? V─rdulis (2022)
C Estonian versions:
https://uudis.net/wordle
https://sonuk.subscribe.ee
https://sonar.ajad.ee
C Latvian versions:
https://wordle.lielakeda.lv
https://ralfulis.vercel.app
https://vardulis.lv
C Lithuanian versions:
https://jakut.is/vordl
https://dienos-zodis.lt
https://wordle.dario.cat
Game construction
? https://github.com/cwackerfuss/react-wordle
? Used to create alternatives in 49 different languages
? Also, 36 special thematic, and 22 math, science or technology oriented versions
? Word lists for Latvian
? Monolingual corpora from Opus
? Tokenised, filtered out <>5char words and any containing non-alphabetical chars
? 1500 most frequent 5char words used as the main list for daily guessing
? 1430 after manual filtering
? Remaining words cross-referenced with tezaurs.lv and kept as the valid guess list
? Further 3-8char words automatically inflected to other wordforms, and all
resulting 5char wordforms also added to the secondary list, resulting in a total of
22,341 unique 5 character words
N-character n-grams
,,0 10,000,000 20,000,000 30,000,000 40,000,000 50,000,000
2
4
6
8
10
12
14
16
18
20
Count in corpora
Sum of n-character words
,0 20,000 40,000 60,000 80,000 100,000 120,
2
4
6
8
10
12
14
16
18
20
Count in corpora
Characters
Different n-character words
Plays per day
0
200
400
600
800
1000
1200
28.01.2022 28.02.2022 31.03.2022 30.04.2022 31.05.2022 30.06.2022 31.07.2022 31.08.2022 30.09.2022
Top 15 guesses at each turn
First guess Second guess Third guess
LV EN Tag Count LV EN Tag Count LV EN Tag Count
SAULE Sun N nom sg 5341 LAIKS Time N nom sg 412 LAIKS Time N nom sg 433
SIENA Wall N nom sg 3179 SAULE Sun N nom sg 355 TIESA Court N nom sg 334
TIESA Court N nom sg 1579 DIENA Day N nom sg 337 TAUKI Fat N nom pl 295
DIENA Day N nom sg 1476 LIEPA Linden N nom sg 290 LIETU Thing N acc sg 273
LAIME Luck N nom sg 1449 KA?IS Cat N nom sg 284 PUSEI Half N dat sg 271
PIENS Milk N nom sg 1237 PIENS Milk N nom sg 266 LAIKU Time N acc sg 247
MAIZE Bread N nom sg 1217 ?BOLS Apple N nom sg 262 PRECE Product N nom sg 230
LIEPA Linden N nom sg 1159 SAUL? Sun N loc sg 207 DIEVS God N nom sg 225
SAITE Link N nom sg 958 SIENA Wall N nom sg 205 LIKTS Put V ptcp pst m 224
KASTE Box N nom sg 952 LIETA Thing N nom sg 204 PUSES Halves N nom pl 214
?BOLS Apple N nom sg 950 MAIZE Bread N nom sg 203 TIRGU Market N acc sg 214
KA?IS Cat N nom sg 869 RIEPA Tire N nom sg 192 T?RPU Outfit N acc sg 206
IELAS Streets N nom pl 676 LAIME Luck N nom sg 188 MIERU Peace N acc sg 201
SKOLA School N nom sg 673 TAUKI Fat N nom pl 175 REIZ? At once Adv 200
Unique word forms used
200
400
600
800
1000
1200
1400
1600
1800
2000
28.01.2022 07.02.2022 17.02.2022 27.02.2022 09.03.2022 19.03.2022 29.03.2022 08.04.2022
Easiest words to guess
Latvian LAIKS TIESA TAUKI GAR?M PUSEI LIKTS DIEVS T?RPU PRECE MIERU
AVG
English Time Court Fat Past Half Put God Outfit Product Peace
Tag N nom sg N nom sg N nom pl Adv N dat sg V ptcp pst m N nom sg N acc sg N nom sg N acc sg
In the corpus 176,409 88,330 8,474 23,569 10,574 6,699 31,497 4,961 22,387 15,178 38,808
Guess 1 1.97% 3.41% 1.95% 0.58% 0.85% 0.83% 2.58% 0.76% 1.06% 1.16% 1.52%
Guess 2 17.53% 10.71% 7.39% 2.31% 3.02% 3.43% 7.27% 1.70% 5.66% 3.87% 6.29%
Guess 3 31.84% 28.63% 27.93% 14.45% 25.14% 20.04% 19.31% 14.84% 21.21% 18.47% 22.19%
Guess 4 28.00% 30.96% 31.62% 34.10% 35.54% 31.26% 29.92% 35.20% 31.77% 29.69% 31.81%
Guess 5 13.69% 15.38% 18.89% 30.83% 20.98% 26.99% 22.08% 28.33% 23.03% 27.37% 22.76%
Guess 6 4.65% 6.72% 7.08% 12.24% 8.79% 11.69% 13.00% 12.89% 10.56% 12.48% 10.01%
Failed 2.33% 4.19% 5.13% 5.49% 5.67% 5.75% 5.83% 6.28% 6.72% 6.96% 5.44%
What makes it easy?
? No repeating characters
? Basic inflections
? Nominative
? Singular
Most difficult words to guess
Latvian C??AS KOKUS K?R?A SE?AS BIE?A RAI?A ?UVES DZI?U FL??U J?GAS
AVG
English Battle Trees Carl¨s Six Frequent Rainis¨ Stiches Deep Tiles Sense
Tag N nom pl N acc pl N gen sg Num nom pl Adj nom sg N gen sg N nom pl Adj acc sg N acc pl N gen sg
In the corpus 22,832 3,582 6,726 14,564 3,732 12,752 3,535 4,535 6,690 5,371 8,432
Guess 1 1.33% 1.01% 2.44% 1.47% 0.65% 1.89% 3.25% 1.15% 1.56% 2.82% 1.76%
Guess 2 0.76% 1.69% 1.14% 1.18% 1.39% 0.63% 2.44% 0.49% 0.73% 2.82% 1.33%
Guess 3 2.29% 2.70% 3.08% 2.65% 2.68% 2.43% 6.18% 2.46% 1.56% 1.41% 2.74%
Guess 4 6.29% 9.97% 5.36% 9.56% 7.95% 8.01% 13.01% 13.46% 8.72% 8.45% 9.08%
Guess 5 14.11% 14.36% 13.31% 15.29% 16.73% 17.10% 19.51% 20.69% 23.12% 19.72% 17.39%
Guess 6 20.21% 20.44% 26.62% 23.97% 25.14% 26.10% 14.80% 25.78% 31.47% 32.39% 24.69%
Failed 55.00% 49.83% 48.05% 45.88% 45.47% 43.83% 40.81% 35.96% 32.84% 32.39% 43.01%
What makes it difficult?
? Diacritics
? Repeating characters
? Various cases
? Plural form
C??AS
V?ZAS
D?VAS
??BAS
??NAS
CENAS
B?DAS
C?KAS
M??AS
R?GAS
DOMAS
S?VAS
D??AS
Public Tweets
0
10
20
30
40
50
60
26.01.2022 26.02.2022 26.03.2022 26.04.2022 26.05.2022 26.06.2022 26.07.2022 26.08.2022 26.09.2022
BIE?A
(frequent,
45.47% failed)
CHINA ?
APPLE ?
RAI?A
(Rainis¨,
43.83% failed)
Public reactions
? Initial criticism over including words of any inflection
? Also, person names, place names and others
? When difficult words appear, many flock to tezaurs.lv for lookup
? For example, `adobe¨ which means air-dried clay brick
Atkal d┤vainais loc┤jums. Min┘jums ar
dedukcijas metodi.
https://twitter.co
m/Richulis/status
/1485026303116
029952
?odien bija sare??┤ti - bet beig─s tom┘r TIK
lo?iski
https://twitter.co
m/zane_zz/statu
s/148495122959
5852805
Lab─k neteik?u, cik ilgi es s┘d┘ju pie t─ p┘d┘j─
v─rda. :D Bet prieks, ka var sp┘l┘t ar┤ latvie?u
valod─.
https://twitter.co
m/LauraMelne/st
atus/148489220
8956383235
O, latviski ar┤. Tikai latvie?u versija nepazina
vair─kus v─rdus ´ kas noteikti ir latviski! Un
latviski ir gr┗t─k! Ar garajiem un m┤kstajiem,
un ??─co?ajiem´
https://twitter.co
m/ArtaLiene/stat
us/14848616136
98080771
Future work
? Make one day per week particularly challenging
? Perhaps also one easy day with only simple forms?
? Explore changing strategies over time after the first year
https://wordle.lielakeda.lv

More Related Content

How Masterly Are People at Playing with Their Vocabulary?

  • 1. How Masterly Are People at Playing with Their Vocabulary? Sanita Reinsone, Mat┤ss Rikters
  • 2. Word games ? Lingo (~1995) ? Wordle (2021) ? V─rdulis (2022) C Estonian versions: https://uudis.net/wordle https://sonuk.subscribe.ee https://sonar.ajad.ee C Latvian versions: https://wordle.lielakeda.lv https://ralfulis.vercel.app https://vardulis.lv C Lithuanian versions: https://jakut.is/vordl https://dienos-zodis.lt https://wordle.dario.cat
  • 3. Game construction ? https://github.com/cwackerfuss/react-wordle ? Used to create alternatives in 49 different languages ? Also, 36 special thematic, and 22 math, science or technology oriented versions ? Word lists for Latvian ? Monolingual corpora from Opus ? Tokenised, filtered out <>5char words and any containing non-alphabetical chars ? 1500 most frequent 5char words used as the main list for daily guessing ? 1430 after manual filtering ? Remaining words cross-referenced with tezaurs.lv and kept as the valid guess list ? Further 3-8char words automatically inflected to other wordforms, and all resulting 5char wordforms also added to the secondary list, resulting in a total of 22,341 unique 5 character words
  • 4. N-character n-grams ,,0 10,000,000 20,000,000 30,000,000 40,000,000 50,000,000 2 4 6 8 10 12 14 16 18 20 Count in corpora Sum of n-character words ,0 20,000 40,000 60,000 80,000 100,000 120, 2 4 6 8 10 12 14 16 18 20 Count in corpora Characters Different n-character words
  • 5. Plays per day 0 200 400 600 800 1000 1200 28.01.2022 28.02.2022 31.03.2022 30.04.2022 31.05.2022 30.06.2022 31.07.2022 31.08.2022 30.09.2022
  • 6. Top 15 guesses at each turn First guess Second guess Third guess LV EN Tag Count LV EN Tag Count LV EN Tag Count SAULE Sun N nom sg 5341 LAIKS Time N nom sg 412 LAIKS Time N nom sg 433 SIENA Wall N nom sg 3179 SAULE Sun N nom sg 355 TIESA Court N nom sg 334 TIESA Court N nom sg 1579 DIENA Day N nom sg 337 TAUKI Fat N nom pl 295 DIENA Day N nom sg 1476 LIEPA Linden N nom sg 290 LIETU Thing N acc sg 273 LAIME Luck N nom sg 1449 KA?IS Cat N nom sg 284 PUSEI Half N dat sg 271 PIENS Milk N nom sg 1237 PIENS Milk N nom sg 266 LAIKU Time N acc sg 247 MAIZE Bread N nom sg 1217 ?BOLS Apple N nom sg 262 PRECE Product N nom sg 230 LIEPA Linden N nom sg 1159 SAUL? Sun N loc sg 207 DIEVS God N nom sg 225 SAITE Link N nom sg 958 SIENA Wall N nom sg 205 LIKTS Put V ptcp pst m 224 KASTE Box N nom sg 952 LIETA Thing N nom sg 204 PUSES Halves N nom pl 214 ?BOLS Apple N nom sg 950 MAIZE Bread N nom sg 203 TIRGU Market N acc sg 214 KA?IS Cat N nom sg 869 RIEPA Tire N nom sg 192 T?RPU Outfit N acc sg 206 IELAS Streets N nom pl 676 LAIME Luck N nom sg 188 MIERU Peace N acc sg 201 SKOLA School N nom sg 673 TAUKI Fat N nom pl 175 REIZ? At once Adv 200
  • 7. Unique word forms used 200 400 600 800 1000 1200 1400 1600 1800 2000 28.01.2022 07.02.2022 17.02.2022 27.02.2022 09.03.2022 19.03.2022 29.03.2022 08.04.2022
  • 8. Easiest words to guess Latvian LAIKS TIESA TAUKI GAR?M PUSEI LIKTS DIEVS T?RPU PRECE MIERU AVG English Time Court Fat Past Half Put God Outfit Product Peace Tag N nom sg N nom sg N nom pl Adv N dat sg V ptcp pst m N nom sg N acc sg N nom sg N acc sg In the corpus 176,409 88,330 8,474 23,569 10,574 6,699 31,497 4,961 22,387 15,178 38,808 Guess 1 1.97% 3.41% 1.95% 0.58% 0.85% 0.83% 2.58% 0.76% 1.06% 1.16% 1.52% Guess 2 17.53% 10.71% 7.39% 2.31% 3.02% 3.43% 7.27% 1.70% 5.66% 3.87% 6.29% Guess 3 31.84% 28.63% 27.93% 14.45% 25.14% 20.04% 19.31% 14.84% 21.21% 18.47% 22.19% Guess 4 28.00% 30.96% 31.62% 34.10% 35.54% 31.26% 29.92% 35.20% 31.77% 29.69% 31.81% Guess 5 13.69% 15.38% 18.89% 30.83% 20.98% 26.99% 22.08% 28.33% 23.03% 27.37% 22.76% Guess 6 4.65% 6.72% 7.08% 12.24% 8.79% 11.69% 13.00% 12.89% 10.56% 12.48% 10.01% Failed 2.33% 4.19% 5.13% 5.49% 5.67% 5.75% 5.83% 6.28% 6.72% 6.96% 5.44%
  • 9. What makes it easy? ? No repeating characters ? Basic inflections ? Nominative ? Singular
  • 10. Most difficult words to guess Latvian C??AS KOKUS K?R?A SE?AS BIE?A RAI?A ?UVES DZI?U FL??U J?GAS AVG English Battle Trees Carl¨s Six Frequent Rainis¨ Stiches Deep Tiles Sense Tag N nom pl N acc pl N gen sg Num nom pl Adj nom sg N gen sg N nom pl Adj acc sg N acc pl N gen sg In the corpus 22,832 3,582 6,726 14,564 3,732 12,752 3,535 4,535 6,690 5,371 8,432 Guess 1 1.33% 1.01% 2.44% 1.47% 0.65% 1.89% 3.25% 1.15% 1.56% 2.82% 1.76% Guess 2 0.76% 1.69% 1.14% 1.18% 1.39% 0.63% 2.44% 0.49% 0.73% 2.82% 1.33% Guess 3 2.29% 2.70% 3.08% 2.65% 2.68% 2.43% 6.18% 2.46% 1.56% 1.41% 2.74% Guess 4 6.29% 9.97% 5.36% 9.56% 7.95% 8.01% 13.01% 13.46% 8.72% 8.45% 9.08% Guess 5 14.11% 14.36% 13.31% 15.29% 16.73% 17.10% 19.51% 20.69% 23.12% 19.72% 17.39% Guess 6 20.21% 20.44% 26.62% 23.97% 25.14% 26.10% 14.80% 25.78% 31.47% 32.39% 24.69% Failed 55.00% 49.83% 48.05% 45.88% 45.47% 43.83% 40.81% 35.96% 32.84% 32.39% 43.01%
  • 11. What makes it difficult? ? Diacritics ? Repeating characters ? Various cases ? Plural form
  • 13. Public Tweets 0 10 20 30 40 50 60 26.01.2022 26.02.2022 26.03.2022 26.04.2022 26.05.2022 26.06.2022 26.07.2022 26.08.2022 26.09.2022 BIE?A (frequent, 45.47% failed) CHINA ? APPLE ? RAI?A (Rainis¨, 43.83% failed)
  • 14. Public reactions ? Initial criticism over including words of any inflection ? Also, person names, place names and others ? When difficult words appear, many flock to tezaurs.lv for lookup ? For example, `adobe¨ which means air-dried clay brick
  • 15. Atkal d┤vainais loc┤jums. Min┘jums ar dedukcijas metodi. https://twitter.co m/Richulis/status /1485026303116 029952 ?odien bija sare??┤ti - bet beig─s tom┘r TIK lo?iski https://twitter.co m/zane_zz/statu s/148495122959 5852805 Lab─k neteik?u, cik ilgi es s┘d┘ju pie t─ p┘d┘j─ v─rda. :D Bet prieks, ka var sp┘l┘t ar┤ latvie?u valod─. https://twitter.co m/LauraMelne/st atus/148489220 8956383235 O, latviski ar┤. Tikai latvie?u versija nepazina vair─kus v─rdus ´ kas noteikti ir latviski! Un latviski ir gr┗t─k! Ar garajiem un m┤kstajiem, un ??─co?ajiem´ https://twitter.co m/ArtaLiene/stat us/14848616136 98080771
  • 16. Future work ? Make one day per week particularly challenging ? Perhaps also one easy day with only simple forms? ? Explore changing strategies over time after the first year