Prior empirical and theoretical work has discussed the role of dominant search engine plays in the function of information gatekeeping on the Web, and there are reports on the high ranking of Wikipedia website among the search engine result pages (SERP). However, little research has been conducted on non-Google search engines and non-English versions of user-generated encyclopedias. This paper proposes a method to quantify the ¡°display¡± gatekeeping differences of the SERP ranking and presents findings based on the Chinese SERP data. Based on 2,500 mainly-Chinese-language search queries, the data set includes the SERP outcome of four Chinese-speaking regions (mainland China, Singapore, Hong Kong and Taiwan) provided by three major search engines (Baidu, and Google and Yahoo), covering over 97% of the search engine market in each region. The findings, analysed and visualized using network analysis techniques, demonstrate the followings: major user-generated encyclopedias are among the most visible; localization factors matter (certain search engine variants produce the most divergent outcomes, especially mainland Chinese ones). The indicated strong effects of ¡°network gatekeeping¡± by search engines also suggest similar dynamics inside user-generated encyclopedias.
1 of 12
Download to read offline
More Related Content
[Wikisym2013] serp revised_apa_notice
1. Only the abstract here is included in the proceedings of the WikiSym + OpenSym 2013 Conference (wsos2013). The full text is a work-in-
progress draft, revised based on blind-review comments and suggestions. Please contact the author for latest citation for this research.
How does localization influence online visibility of user-
generated encyclopedias? A study on Chinese-language
Search Engine Result Pages (SERPs)
Han-Teng Liao
Oxford Internet Institute
University of Oxford
Oxford, United Kingdom
hanteng@gmail.com
ABSTRACT
Prior empirical and theoretical work has discussed the role of
dominant search engine plays in the function of information
gatekeeping on the Web, and there are reports on the high ranking
of Wikipedia website among the search engine result pages (SERP).
However, little research has been conducted on non-Google search
engines and non-English versions of user-generated encyclopedias.
This paper proposes a method to quantify the ¡°display¡± gatekeeping
differences of the SERP ranking and presents findings based on the
Chinese SERP data. Based on 2,500 mainly-Chinese-language
search queries, the data set includes the SERP outcome of four
Chinese-speaking regions (mainland China, Singapore, Hong Kong
and Taiwan) provided by three major search engines (Baidu, and
Google and Yahoo), covering over 97% of the search engine
market in each region. The findings, analysed and visualized using
network analysis techniques, demonstrate the followings: major
user-generated encyclopedias are among the most visible;
localization factors matter (certain search engine variants produce
the most divergent outcomes, especially mainland Chinese ones).
The indicated strong effects of ¡°network gatekeeping¡± by search
engines also suggest similar dynamics inside user-generated
encyclopedias.
Categories and Subject Descriptors
[Human-centered computing]: Collaborative and social
computing ¨C Collaborative filtering, Wikis, Empirical studies in
collaborative and social computing
[Information Systems]: Web search engines ¨C Collaborative
filtering, Page and site ranking
General Terms
Management, Performance, Design, Human Factors, Theory
Keywords
Geo-linguistic analysis, network analysis, Network gatekeeping,
Chinese Internet, Chinese characters, Localization, censorship.
1. INTRODUCTION
Using search engine is among the most popular online activity for
users in the US (Fallows, 2008) and mainland China (CIC, 2009;
CNNIC, 2009), and has been among the driving forces of the fast-
growing online advertising platform (Varian, 2007; SEMPO, 2011;
IDATE, 2011; PricewaterhouseCoopers, 2011). It has been
reported that (and speculated why) the global leader of search
engines Google has consistently favoured the global leader of user-
generated encyclopedias Wikipedia by showing relevant pages
frequently and prominently in the search engine result pages
(thereafter SERP) (Charlton, 2012; ?uhalev, 2006; Gray, 2007;
Jones, 2007; Silverwood-Cope, 2012). Independent market
research by Nielsen Online and Hitwise Intelligence has
demonstrated that Wikipedia not only dominates the online visits
for encyclopedia content, but also does so mainly because of the
traffic directed by major Web search engines (Hopkins, 2009;
Nielsen Online, 2008). Even the Wikimedia Foundation
acknowledged this (Google drives traffic to Wikipedia), but
nonetheless argued that half of its readers did want to look for
Wikipedia content (Khanna, 2011). Thus, as major websites that
dominate traffic and user attention, Google and Wikipedia seem to
be central in guiding users where to look.
However, most of the findings and discussions are limited to or
predominantly focused on the English-language context(Battelle,
2005; Bermejo, 2009; Couvering, 2004, 2008; Dahlberg, 2005;
Hargittai, 2007; Segev, 2008), and little effort has been made to
understand whether such a phenomenon is specific to
Google/Wikipedia or can be found for other major search engines
and user-generated encyclopedias. In addition, the multi-lingual
internet and the rise of non-English users on the Web have multiple
implications on the ¡°localization¡± effects on search engines.
Localization (thereafter L10n), a process of adapting computer
software or information systems for a group of users usually
defined by national boundaries or geo-linguistic profiles(Hussain
& Mohan, 2008; Liao, 2011; McKenna & Naftulin, 2000), is
expected to influence users¡¯ information-seeking practices. Both
Google and Wikipedia provide localized content and interfaces
designed to serve different group of users. .
Because Google (or other general-purpose search engines),
Wikipedia (or other user-generated encyclopedias) and localization
are likely to present and thus frame the Web differently for different
groups of users, they effectively filter information for them. While
such filtering can be described as gatekeeping by communication
scholars, the fact that the Web users can directly or indirectly
participate in such information filtering processes has introduced
techniques and theories of "collaborative filtering" (Benkler, 2006;
Goldberg, Nichols, Oki, & Terry, 1992) and ¡°network
gatekeeping¡±(Barzilai-Nahon, 2008). Indeed, while Google and
Only a prior version of the abstract above was included in the
proceedings of the WikiSym + OpenSym 2013 Conference
(wsos2013). The text below is a work-in-progress draft, revised
based on blind-review comments and suggestions. Please contact
the author for latest citation for this research.
WikiSym '13 August 05 - 07 2013, Hong Kong, China
Copyright 2013 ACM 978-1-4503-1852-5/13/08 ...$15.00.
2. 2
Wikipedia may concentrate Web traffic and command user
attention as major global websites, users¡¯ contribution of web
content and links may also influence such filtering and gatekeeping
outcomes, as demonstrated by the case of Google query of
¡°Jew¡±(Bar Ilan, 2006)? : some users were organized to help the
Wikipedia¡¯s entry page of ¡°Jew¡± to rank higher in the Google¡¯s
English-language SERPs.
Thus, although both "collaborative filtering" (Benkler, 2006;
Goldberg et al., 1992) and ¡°network gatekeeping¡±(Barzilai-Nahon,
2008) are indeed about filtering and keeping information, the
possibility of participation by user input makes the different from
the filtering and gatekeeping processes in traditional media.
Nonetheless, I argue that geographic and linguistic factors may
bound or limit such collaborative and networking possibilities and
thus re-introducing national and/or linguistic boundaries back on
the Web. Indeed, as early as in the early 2000s, researchers such as
Zittrain and Sunstein have raised the issues of localized search
results in filtering political content or fragmenting public sphere
(Morris & Ogan, 2002; Sunstein, 2002). For SERPs, the question
of information control and linguistic boundaries remains, while the
¡°borders¡± of national framework have been reintroduced in many
aspects of technological and legal arrangements(University &
School, 2006). In particular, Google¡¯s first collaboration with (or
accommodation of) Chinese government¡¯s need and later exit from
mainland China has demonstrated the intricate political and cultural
dimensions of ¡°localization¡± of search engine services(Vaughan &
Zhang, 2007; Einhorn, 2010). Thus, the research gap on the effects
of localization on SERPs and non-English Wikipedia need to be
filled, including prominent cases of Chinese-language and Arabic-
language internet users whose recent presence and participation in
the new internet world has also attracted much attention (Dutta,
Dutton, & Law, 2011). In particular, in order to answer how search
engines and/or user-generated encyclopedias reintroduce or shape
the national or social boundaries, more empirical work on L10n
effects is needed (Arag¨®n, Kaltenbrunner, Laniado, & Volkovich,
2012; Bao et al., 2012; Hecht & Gergle, 2010; Liao, 2008, 2011;
Luyt, Goh, & Lee, 2009; Massa & Scrinzi, 2012; Mazieres &
Huron, 2013; Petzold, Liao, Hartley, & Potts, 2012; Rogers &
Sendijarevic, 2012; Warncke-Wang, Uduwage, Dong, & Riedl,
2012). L10n is also briefly discussed as contributing factor to
¡°internationalization mechanisms¡± of ¡°network
gatekeeping¡±(Barzilai-Nahon, 2008), holding the key for
researchers to understand the nationalization or internationalization
dynamics of the Web.
For Chinese-language internet, there are many localized versions
provided several major search engines, including examples such as
Yahoo China, Google Hong Kong, Google Taiwan, etc. I call them
search engine-locale variants (thereafter search engine variants).
Do different search engine variants guide users from various
Chinese-speaking regions to see the same websites regardless of
which search engine they chose? Or do they see divergent SERP?
Prior empirical research has been conducted in analysing SERPs
inside mainland China, with the latest research on 316 search query
phrases of ¡°Internet event¡± collected in 2009, indicating that indeed
Baidu Baike and Chinese Wikipedia has ranked high among the
SERPs (Jiang & Akhtar, 2011). However, it focuses on (and thus is
limited to) simplified Chinese users in mainland China and the
selected sample of search queries was based upon internet incidents
that are politically controversial to mainland China. This paper
contributes findings based 2500 search queries in 2011, covering
not only more topics but also more Chinese-language search
engines across more regions such as Hong Kong, Taiwan and
Singapore. Before presenting the methods and findings, the next
section will first provide a theoretical framework that captures the
localization effects of search engines.
2. L10N OF SEARCH ENGINES
Observing how search engines categorise users is one of the
practical ways to examine the impact of search engines on national
and/or regional boundaries. As part of the industry practice in
internationalization/Localization (i18n/L10n), search engines
provide different interfaces and services for different users, usually
categorized by their geo-linguistic identifiers, using language codes
such as zh-TW (Chinese in Taiwan), pt-BR (Portuguese in Brazil),
and en-IN (English in India)(DePalma, 2002; Dunne, 2006). These
identifiers in turn influence how content is aggregated, filtered and
prioritised for users who share the same or similar language
preferences. Online users and audiences are often partitioned
accordingly by search engine marketing tools such as Google
AdWords and Microsoft adCenter. Unlike the globalized TV
industry where broadcasting and cable TV are still bounded to
geography, these geo-linguistic codes are configurable. For
example, one can manage to use UK version of Google even when
not in UK
To conceptualize the localization effects of search engines, this
paper applies the ¡°network gatekeeping¡± theory (Barzilai-Nahon,
2008) for the following reasons. First, localization was discussed
as contributing factor to ¡°internationalization mechanisms¡± of
¡°network gatekeeping¡±(Barzilai-Nahon, 2008). Albeit the theory
comes mainly from information science to better understand
information control in network settings, its multidisciplinary
aspects (Jucquois-Delpierre, 2007) can help researchers understand
how seemingly technical arrangement of computer software or
information system can have enormous effects on gatekeeping or
controlling the flows and presentation of information. Second,
distinct from traditional gatekeeping theory that focuses on
withholding or deletion of information, the network gatekeeping
theory not only conceptualizes localization as part of the
gatekeeping processes, but also emphasizes the ¡°display¡± bases for
such processes: ¡°Presenting information in a particular visual form
designed to catch the eye¡± (Barzilai-Nahon, 2008). Indeed, search
engines visually present the results. Thus, to understand the
localization effects of search engines, a data collection method
must consider not only the localization parameters but also the
visual display of search results.
I argue that locales in computing, a set of parameters that describes
user¡¯s language, region and other interface preferences, constitute
one of the most important online ¡°situations¡± for online media. By
¡°situations¡± I use the definition used by medium theorists in the
tradition of media ecology: ¡°situations as (social) information-
systems that set the patterns of access to information¡± (Meyrowitz,
1986, 1994). Note that as medium theorists focus on medium rather
on messages, the definition is particular suitable for studying search
engines because some major companies including Google have
resisted the idea that they are in the content or media industry by
insisting that they are information companies. For media and
communication scholars, the underlying question is less about
Google¡¯s industrial identity but rather about how online media in
general can use locales to segment, fragment and integrate different
media markets and/or audiences by using different information
system settings. Thus, geographic and linguistic factors seem to
¡°set the patterns of access to information¡±, as geo-linguistic
situations are expected to determine which websites will be the
most visible and constantly appearing ones in the SERPs.
4. 4
2.3 Merging and diverging effects of SERPs
If the aforementioned market survey and traffic reports are correct,
search engine users from Taiwan mostly filter web pages through
the lens of search engine variants of Google_TW and Yahoo_TW.
ThosefromHongKongmostlyuseGoogle_HKandYahoo_HK,andsoon.
By conceptualizing search engines as medium, the merging and
diverging patterns of SERPs will also indicate whether users from
these regions will see similar websites, using different search
engine providers. Hence, the SERP data may indicate patterns
which search engines may overcome offline boundaries across
these regions (if the SERPs converge on specific websites) and
which may reinforce them (if the SERPs diverge), thereby
contributing to the general question of media and globalization on
the case of search engines.
To do so, the proposed method of visibility tests that quantify the
top-ranking websites can be used as indication of search engines
exercising its ¡°display¡± gatekeeping power for certain websites.
Based on the quantified numbers of such display gatekeeping
power, the visibility patterns can be systematically examined
between (1) search engine variants and (2) visible websites.
Moreover, visibility scores can be further aggregated (i.e. summed)
over a selection of search queries, so as to better answer different
research questions that guide such selection. Ideally, by exhausting
visibility scores for various localized versions of SERPs over large
sample of search queries, researchers can better compare how
visible a website is across different search engine variants, thereby
paving the ways for showing the merging and diverging patterns of
the SERPs.
It should be noted that, borrowing from the academic research on
webometric visibility and the industry practice on keyword
advertising, the proposed framework and method is general enough
for future study regardless the providers and/or geo-linguistic
preferences of search engines: For example. How different, or
similar, are the SERPs provided by Yandex versus Google in
Turkey? How different, or similar, are the SERPs provided by
Google Hindi versus Google Urdu in India? The outcome of
visibility scores can be further visualized and analysed by various
network analysis techniques. Thus, this method will answer these
empirical questions, with results that can then be interpreted to
explore the cultural political implications of such patterns.
To showcase how the integrated method works satisfactorily, I
choose to study Chinese-language internet because its boundaries
have several historical, cultural and political complications. For
example, regions such as mainland China, Singapore, Hong Kong
and Taiwan have different practices in democracy, free speech,
human rights and Chinese scripts (Damm, 2007; Liao, 2009; Zhao
& Baldauf, 2007).
3. DATA Collection
To identify how search engine variants influence the Chinese-
language SERPs, the top-10 results should provide enough
indication.
3.1 Search Queries
First, I have selected about 2500 search queries that are relevant to
Chinese cultural and political topics. As summarized in Table 1, the
selection includes all 990 entries in "The Cambridge Encyclopedia
of China"(The Cambridge encyclopedia of China, 1991), the top 10
search terms provided respectively by Baidu and Google (including
mainland China, Hong Kong and Taiwan variations) of various
categories since 2007, major popular cultural references, notable
people names and some other culturally and politically "sensitive"
keywords. Although other selection or combination is possible, this
selection aims to focus this research on the prominence of user-
generated encyclopedias across Chinese-speaking regions.
Table 1 Sources and numbers of search queries
Second, the sample keywords are transliterated into search queries
according to the respective Chinese orthographic preferences
(simplified Chinese for mainland China and Singapore; traditional
Chinese for Hong Kong and Taiwan), making this research first of
its kind to compare SERPs across Chinese-language variants.
Third, the top-10 SERPs are collected for the nine search engine
variants that cover four major Chinese-speaking regions of China,
Singapore, Hong Kong and Taiwan. Then they are parsed and
processed by the visibility tests, weighting the high-ranking
website with higher visibility scores.
3.2 Search Results
Around 22,000 web links are extracted from the SERPs based on
the outcome of 2500 search queries submitted across nine
variations of search engines in 2011. These 22,000 web links
correspond to around 25,000 unique domain names. Then the
outcome is further consolidated manually by checking IP addresses
to over 16,000 websites (e.g. the website of sohu.com aggregates
money.sohu.com and women.sohu.com). Finally, all education and
government websites are aggregated into respective top-level
domain names, such as edu.tw, edu.cn, gov.cn and gov.hk.
4. FINDINGS
To show how localization influences online visibility, the collected
data of visibility scores are unpacked and analysed as follows.
4.1 Concentrated visibility scores
Figure 3 shows the respective proportion distribution and
accumulative distribution of visibility scores for the top-100 most
visible websites. It is evident that near 80% of the visibility scores
are concentrated over the top-100 websites, and indeed three user-
generated encyclopedia websites ranked highest: (1)wikipedia.org,
(2) and (3)hudong.com. For the website wikipedia.org,
Chinese Wikipedia (zh.wikipedia.org) is the most visible; for
, Baidu Baike (baike.baidu.org) is the most visible.
Categories of Search Keywords
The Cambridge Encyclopedia of China 990
Top 10 Search Terms (Google and Baidu) 387
Best Film/Popular Music (China, Hong Kong, Taiwan) 364
Modern Concepts (shared with modern Japanese) 171
Notable People 476
Nobel Prize Winners of Chinese origin 11
Major Chinese Politicians 187
Rich People (China, Hong Kong, Taiwan) 82
100 Contemporary Intellectuals (China) 100
Major Fugitives From Taiwan 17
Victims of White Terror in Taiwan 79
Potentially Sensitive Terms 112
Japanese AV porn stars 48
Prosecuted and Sentenced Corrupted Chinese Officials 14
Documented Filtered Words by Great Firewall 50
Total 2500
Numbers
5. 5
Figure 3. Concentrated visibility scores
Since the top-100 most visible websites account for more than 80%
of the visibility scores, strong concentration effects are found. Thus,
the following sub-section further examines these websites.
4.2 Tabulating visibility scores
Table 2 tabulates the top-100 ranking websites, and their respective
visibility scores for each search engine variants. Each cell shows
the visibility score that a search engine variant has contributed to a
particular website. For example, the first cell 34.30 indicates how
much Baidu_CN has contributed to Chinese Wikipedia
(zh.wikipedia.org).
Table 2 Top-ranking websites: visibility scores
Note that the top three are all user-generated encyclopedia: Chinese
Wikipedia, Baidu Baike and Hudong Baike. For another example,
the official news website of Falun Gong (epochtimes.com which is
ranked at 18th) is completely blocked out from Baidu¡¯s results (i.e.
the zero visibility score suggests that it never show up in Baidu¡¯s
SERPs). It is in direct contrast, say for Yahoo_HK in third last
column, where it enjoys visibility score higher than all other
mainland-based website including Chinese official media People¡¯s
Daily (people.com.cn which is ranked at 15th), suggesting that the
Falun Gong news website perform better even than People¡¯s Daily
for Yahoo Hong Kong.
Therefore, Table 2 shows in detail which search engine variants
favour which websites by citing and showing them more often and
prominently in SERPs, rendering them easier to be found (at least
for the selection of the search queries). The top-ranking websites
include major China-based portals (e.g. , sina.com.cn,
, sohu.com and 163.com), US-based websites (e.g.
youtube.com, facebook.com), mainland China-based news media
websites (e.g. people.com.cn, xinhuanet.com, ifeng.com) and the
aggregated category of mainland Chinese government websites
(i.e. gov.cn).
Table 2 orders the websites from the most visible one at the top row
to the least visible at the bottom row, while the order of search
engine variants is decided firstly by search engine providers (from
Baidu, Google to Yahoo) then secondly by region (from CN, HK,
SG to TW). It is relatively difficult, however, to see any pattern
right away from Table 2 as it is tabulated. In other words, although
each cell in the table shows the specific level of propensity that a
search engine variant prefers a certain website in their SERPs, the
table as a whole fails to show in a clear way the overall propensity
of which "group" of search engine variants favours which "set" of
websites.
To identify patterns of converging and diverging, I will use
blockmodeling analysis in the next subsection to study the visibility
scores in Table 2, each of which represents the strength of ties
between search engines and websites. To avoid arbitrary clustering
results produced by less-consequential websites collected in the
SERPs, only the top-100 most visible websites are considered for
analysis.
4.3 Clustering using blockmodeling analysis
Cluster analysis is commonly used for exploratory data mining to
find how different data points can be grouped based on some
statistical data analysis of similarities and differences. To find how
¡°birds of a feather flock together¡± for the websites and search
engine variants at hand, various clustering techniques can be
applied, including the agglomerative hierarchical clustering
analysis that produce a family tree that details how each data points
can be grouped.
Nonetheless, this study chooses blockmodeling analysis (Doreian,
Batagelj, & Ferligoj, 2004) for the following reasons. First, a
blockmodel analysis will produce simplified outcome that suits
better for the research question at hand: to identify the rough
patterns, without the need to see how specific details on which
website is closer to another. Second, as to be shown later, a
blockmodel analysis can greatly simplify a complex dataset to
provide succinct summarization of the overall structure. Third, as
researchers can and must design a blockmodel for data points to fit,
a blockmodel analysis is particularly useful to identify converging
and diverging patterns. It also provides a systematic way to see how
the data points fit the model or not. Fourth, a blockmodel can be
seen as a simplified network, and thus it can help to produce a
simplified visualization of network data. It should be noted that the
dataset can be seen as a two-mode network: Different ¡°nodes¡± of
search engine variants giving different visibility scores to different
¡°nodes¡± of websites. It is thus equivalent to a network of visibility
scores. High visibility scores indicate strong ¡°relationship¡±. It is an
example of two-mode network because there are two types of nodes
(i.e. search engine variants and websites) and the relationship
between the nodes is limited between the two types of nodes (i.e.
the visibility score contributed by one search engine variant to one
website).
4.3.1 A blockmodel design
Before detailing how the cluster outcome helps identify the
merging and diverging patterns systematically, it is necessary to
explain the basis on which I design the blockmodel in Table 3. To
build a blockmodel, researchers have to make design decisions on
g g g g
0%
10%
20%
30%
40%
50%
60%
70%
80%
0 20 40 60 80 100
Accumulative
Proportion
Rank-
ing
Websites
(Aggregated)
Baidu
_CN
Google
_CN
Google
_HK
Google
_SG
Google
_TW
Yahoo
_CN
Yahoo
_HK
Yahoo
_SG
Yahoo
_TW
1 zh.wikipedia.org 34.30 272.37 611.39 304.15 586.50 24.46 833.95 254.00 721.01
2 baike. 661.93 410.28 174.04 433.81 125.52 72.44 39.10 508.05 4.88
3 hudong.com 5.30 107.93 71.29 107.92 57.31 267.17 2.54 168.23 0.35
4 385.80 51.36 13.29 53.21 9.93 20.52 7.17 102.80 1.65
5 sina.com.cn 59.18 76.85 21.69 69.33 16.63 41.70 2.04 35.29 0.68
6 knowledge.yahoo.com 0.10 0.03 0.29 0.36 93.46 20.33 140.07
7 edu.tw 0.46 5.14 21.14 7.21 64.29 0.06 30.61 21.07 102.98
8 40.27 41.23 13.00 37.26 11.64 57.85 2.07 23.35 0.95
9 youtube.com 0.29 8.39 66.03 9.04 68.63 45.20 4.96 19.00
10 gov.cn 25.46 38.94 20.30 32.29 15.61 43.03 5.29 34.84 3.57
11 sohu.com 20.89 32.82 10.08 27.34 8.08 38.97 3.18 22.11 1.57
12 163.com 25.59 34.68 10.78 31.51 10.00 32.31 2.52 14.56 0.87
13 facebook.com 0.29 1.93 8.96 2.26 19.00 88.33 8.31 33.61
14 youku.com 42.04 29.12 10.32 19.34 8.41 36.38 1.03 15.31 0.64
15 people.com.cn 14.54 23.19 16.00 23.82 18.14 20.97 17.81 11.43 13.39
16 blog.sina.com.cn 21.73 28.47 15.41 26.79 13.95 9.75 4.27 33.78 2.53
17 xinhuanet.com 26.13 27.18 21.02 27.71 20.06 11.50 1.70 19.31 0.40
18 epochtimes.com 1.05 27.34 2.23 33.05 34.57 3.93 36.62
19 ifeng.com 25.67 25.13 11.86 24.39 9.67 16.70 4.20 10.12 2.56
20 baike.soso.com 11.08 7.60 1.31 5.93 1.05 29.16 0.29 63.30 0.04
6. 6
the ¡°connection types¡± (e.g. ¡°complete¡± versus ¡°null¡±) and the
number of blocks. A block is said to be ¡°complete¡± if all cells in
that block indicate strong relationship and a block is said to be ¡°null¡±
if all cells in that block contain only weak or none relationship.
Thus the three by three blockmodel in Table 3 assumes the data
points will fit into nine blocks. For this study, nine search engines
will be divided into three groups, and the top-100 websites will be
categorized into three sets of websites.
Table 3 Expected outcome of blockmodeling
?
The rationale behind this model is to identify converging and
diverging patterns. The second part of the Table 3 shows how three
groups of search engine variants (Cluster A, B and C) may
converge or diverge on different sets of websites (Cluster X, Y and
Z). Thus, I assume a middle ground of websites exist: for all search
engine variants, there will be a set of websites that are all visible
(i.e. Cluster Y). That is, Cluster A, B and C converge on Cluster Y
with high visibility scores, indicated by the dark blocks containing
strong ties (i.e. high visibility scores). To account for any deviation
from the "converging" middle ground, I expect two blocks of low-
visibility cells (i.e. weak or none relationship), as represented by
two white cells in Table 3): one at the top-left and another at the
bottom-right. Both blocks thus indicate the patterns of divergence,
or lack of convergence. For this study, if all search engine variants
converge on the same top visible websites, then there should be no
patterns of divergence. Using this scenario of complete
convergence as the null hypothesis (no difference in visibility
patterns), I expect some evidence of diverging effects to reject the
null hypothesis. If there is a significant number of websites in the
low-visibility blocks (one at upper-left and another at lower-right
corner), then the diverging patterns are identified accordingly.
4.3.2 Patterns of merging and diverging
Using the blockmodeling function provided by a social network
analysis tool called Pajek, the 9 by 100 cells of strong versus weak
ties are simplified into the three-by-three blockmodel, as shown in
Table 4. For each cell, the color represents strong (dark) or weak
(white) ties, and these cells are roughly partitioned into three-by-
three blocks, thereby effectively clustering the nine search engine
variants into three groups and the 100 most visible websites into
three sets. It is not a perfect match, and there are 87 cells out of 900
(9.67%) that does not match the designed block model. Given the
space limitation, only the top-20 websites in full.
As shown in Table 4, for the top 100 websites, 39 of them are
categorized into the first cluster of websites (Cluster X), 13 to
Cluster Y and 49 to Cluster Z. If we look at the top-20 most visible
websites only, the converging set of websites (Cluster Y) is thin
(only one website). This website (people.com.cn) belongs to the
Chinese official party organ media People¡¯s Daily.
Table 4 Blockmodeling outcome
weak strong strong
strong strong strong
strong strong weak
Rank-
ing
Websites
(Aggregated)
Baidu_
CN
Yahoo_
CN
Google
_CN
Yahoo_
SG
Google
_SG
Google
_TW
Google
_HK
Yahoo_
HK
Yahoo_
TW
1 zh.wikipedia.org 34.30 24.46 272.37 254.00 304.15 586.50 611.39 833.95 721.01
6 knowledge.yahoo.com 0.10 0.00 0.03 20.33 0.00 0.36 0.29 93.46 140.07
7 edu.tw 0.46 0.06 5.14 21.07 7.21 64.29 21.14 30.61 102.98
9 youtube.com 0.29 0.00 8.39 4.96 9.04 68.63 66.03 45.20 19.00
13 facebook.com 0.29 0.00 1.93 8.31 2.26 19.00 8.96 88.33 33.61
18 epochtimes.com 0.00 0.00 1.05 3.93 2.23 33.05 27.34 34.57 36.62
¡ and other 33 websites (The total number of websites is 39 for this block)
15 people.com.cn 14.54 20.97 23.19 11.43 23.82 18.14 16.00 17.81 13.39
¡ and other 12 websites (The total number of websites is 13 for this block)
2 baike. 661.93 72.44 410.28 508.05 433.81 125.52 174.04 39.10 4.88
3 hudong.com 5.30 267.17 107.93 168.23 107.92 57.31 71.29 2.54 0.35
4 385.80 20.52 51.36 102.80 53.21 9.93 13.29 7.17 1.65
5 sina.com.cn 59.18 41.70 76.85 35.29 69.33 16.63 21.69 2.04 0.68
8 40.27 57.85 41.23 23.35 37.26 11.64 13.00 2.07 0.95
10 gov.cn 25.46 43.03 38.94 34.84 32.29 15.61 20.30 5.29 3.57
11 sohu.com 20.89 38.97 32.82 22.11 27.34 8.08 10.08 3.18 1.57
12 163.com 25.59 32.31 34.68 14.56 31.51 10.00 10.78 2.52 0.87
14 youku.com 42.04 36.38 29.12 15.31 19.34 8.41 10.32 1.03 0.64
16 blog.sina.com.cn 21.73 9.75 28.47 33.78 26.79 13.95 15.41 4.27 2.53
17 xinhuanet.com 26.13 11.50 27.18 19.31 27.71 20.06 21.02 1.70 0.40
19 ifeng.com 25.67 16.70 25.13 10.12 24.39 9.67 11.86 4.20 2.56
20 baike.soso.com 11.08 29.16 7.60 63.30 5.93 1.05 1.31 0.29 0.04
¡ and other 35 websites (The total number of websites is 48 for this block)
relatively strong versus weak: vs blockmodel:
strong weak
This blockmodeling findings also help identify the merging and
diverging patterns of search engine variants. Cluster A contains
Baidu_CN, Yahoo_CN and Google_CN; Cluster B contains
Google_HK, Google_SG, Google_TW and Yahoo_SG; Cluster C
contains Yahoo_HK and Yahoo_TW. The cluster outcome shown
in Table 5 indicates both patterns of merging and diverging,
determined by the choice of search engine variants. For the three
groups of search engine variants, two groups of search engine
variants deviate from the rest. The first group (Cluster A) contains
search engine variants designed for mainland China (Baidu_CN,
Yahoo_CN and Google_CN), and the second group (Cluster C)
contains the Yahoo Search for Taiwan and Hong Kong (Yahoo_HK
and Yahoo_TW). Thus, while the search engine variants in Cluster
B produce converging results for the top-100 websites, with
¡°complete¡± connection types to all clusters of websites, those in
Cluster A and those in Cluster C lead to diverging SERP.
Table 5 Clusters identified by blockmodeling
4.4 Visualizing and unpacking findings
To show the results of visibility scores in a more intuitive manner,
a network visualization graph of the top-800 most visible websites
is shown in Figure 4. I visualize the nine search engine variants
(shown as the text boxes at the peripheral) and 800 most visible
websites (shown as nodes in the middle). Thus, the two-mode
network is presented in a way to indicate the overall likelihood for
a given search engine variant to recommend a website shown in the
middle. Pointing only from one node of search engine variant to
one node of website, each arrow represents a total visibility score
Cluster A Cluster B Cluster C
Cluster X complete complete
Cluster Y complete complete complete
Cluster Z complete complete
Cluster A Cluster B Cluster C
Cluster X
Cluster Y
Cluster Z
converging
converging
converging
Cluster A Cluster B Cluster C
Baidu_CN Google_HK Yahoo_HK
Google_CN Google_SG Yahoo_TW
Yahoo_CN Google_TW
Websites # Yahoo_SG
Cluster X 39 complete complete
Cluster Y 13 complete complete complete
Cluster Z 48 complete complete
7. 7
contributed by a search engine variant to a website, with its arrow
width proportional to the values of visibility scores: Wider arrows
indicate higher visibility scores . Similarly, the area size of a node
is proportional to the sum of visibility scores a website receive from
all search engine variants, allowing easy comparison on which
websites are more visible.
Note that the visibility scores are distributed quiet unevenly and
thus only the top 20 are marked with their respective ranking
numbers. User-generated encyclopedias are the most visible
websites (node 1: Chinese Wikipedia , node 2: Baidu Baike, node
3: Hudong). For another, Chinese Wikipedia(1) is highly visible to
almost all variations except Yahoo_CN and Baidu_CN, while
Baidu Baike(2) highly visible in Baidu_CN, Google_CN,
Google_SG, and moderately so in Google_HK.
Based on the previous clustering results, two red dash lines are also
drawn in Figure 4, roughly indicating three areas. Positioned in the
middle are the search engine variants in Cluster B, because of their
converging patterns on strong ties with most websites. The two red
dash lines also show the search engine variants in Cluster A to the
left and those in Cluster C to its right, indicating diverging effects
because of the presence of weak ties. This explains why Cluster A
and Cluster C is shown adjacent to Cluster B, but not adjacent to
each other. This visualization is thus consistent with the findings
shown in Table 5.
This blockmodeling findings also help identify the merging and
diverging patterns of search engine variants. Cluster A contains
Baidu_CN, Yahoo_CN and Google_CN; Cluster B contains
Google_HK, Google_SG, Google_TW and Yahoo_SG; Cluster C
contains Yahoo_HK and Yahoo_TW. The cluster outcome shown
in Table 5 indicates both patterns of merging and diverging,
determined by the choice of search engine variants. For the three
groups of search engine variants, two groups of search engine
variants deviate from the rest. The first group (Cluster A) contains
search engine variants designed for mainland China (Baidu_CN,
Yahoo_CN and Google_CN), and the second group (Cluster C)
contains the Yahoo Search for Taiwan and Hong Kong (Yahoo_HK
and Yahoo_TW). Thus, while the search engine variants in Cluster
B produce converging results for the top-100 websites, with
¡°complete¡± connection types to all clusters of websites, those in
Cluster A and those in Cluster C lead to diverging SERP.
The findings can also be unpacked depending the specific search
engine variant. Based on the same method, an additional 500
Chinese names of the Fortune 500 companies are added to the
selection of 2500 search queries, producing a second dataset in
2012 (Liao, 2013a). The following paragraphs unpack this second
dataset for two search engine variants in mainland China:
Google_CN (see Table 6) and Baidu_CN (see Table 7).
The results for the top-20 websites for each categories of search
queries of Google_CN, as shown in Table 6, show that
rank the top in almost all categories. Wikipedia.org is close second
here for Google_CN, suggesting a general observation that search
engines favour user-generated encyclopedias. The particular
findings also provide some counter evidence against the idea that
Google as a specific comapny favour Wikipedia as a website
because Google_CN actually favours Baidu Baike more than
Chinese Wikipedia, as clearly shown in Table 6.
The findings of Baidu_CN in Table 7 shows even more dominance
by Baidu Baike: It dominates all of seven categories with the
proportion of visibility scores is comparatively much concentrated
when compared to the results of Google_CN (see Table 6). In
addition, when considering the ranking position of hudong.com, the
findings seem to confirm the unfair competition accusation made
by Hudong¡¯s CEO against Baidu (Yang, 2011). Depending on the
types of search quries, Hudong.com is ranked by Google_CN from
3rd to 9th (see Table 6). In contrast, Hudong Baike is not even
among the top-20 for many categories of the sampled queries for
Baidu Search. Indeed, if Google¡¯s SERP can serve as an
independent third party for the competition between Baidu Baike
Figure 4. Delineating the boundaries of geo-linguistic settings based on SERPs.
Rank-
ing
Websites
(Aggregated)
1 zh.wikipedia.org
2 baike.
3 hudong.com
4
5 sina.com.cn
6 knowledge.yahoo.com
7 edu.tw
8
9 youtube.com
10 gov.cn
11 sohu.com
12 163.com
13 facebook.com
14 youku.com
15 people.com.cn
16 blog.sina.com.cn
17 xinhuanet.com
18 epochtimes.com
19 ifeng.com
20 baike.soso.com
8. 8
and Hudong, Google does not make Hudong almost invisible as
Baidu does.
Hence if users from mainland China use Google Search instead of
Baidu Search, then Chinese Wikipedia will become equally visible
as Baidu Baike for them.
5. DISCUSSION
By systematically analysing the SERPs collected across four major
Chinese-speaking regions, it is shown that the patterns of merging
and diverging do exist. It is achieved by calculating visibility scores
as the equivalent ¡°social ties¡± between search engine variants on
one hand and top-ranking websites on the other. Both the network
visualization and the blockmodeling outcomes show that the geo-
linguistic factors do make Chinese-language SERPs diverge on
certain websites, while converging on another. In particular, of the
nine search engine variants, the first group that diverges from the
rest contains search engine variants designed for mainland China
(Baidu_CN, Yahoo_CN and Google_CN), The second group
contains the Yahoo Search for Taiwan and Hong Kong (Yahoo_HK
and Yahoo_TW).
The findings suggest that the major online boundary in Chinese
Internet is drawn first along the line of regional difference, with all
mainland Chinese search engine settings share similar SERPs
among themselves, but not with the others to the same degree, as
shown in Figure 4. Another boundary is drawn for Yahoo Taiwan
and Yahoo Hong Kong at the other end. It is relatively easy to
explain the latter results because Yahoo Search by default
prioritizes local content, with other geo-linguistic variant options
available for users listed in the web interface: e.g. ¡°search the
traditional Chinese-character-written web pages¡± or ¡°search the
global websites¡±.
In contrast, it is relatively difficult to provide just technical
explanations regarding the question why all three mainland Chinese
settings do not share that much with other settings in terms of the
corresponding SERPs. It is likely that many of the websites that
are absent from the SERPs in three mainland Chinese settings
include those are not politically welcome in mainland China. Note
that the first two columns in Table 5 represent Baidu_CN and
Yahoo_CN, both of which constantly have weak ties with several
of the top 100 websites. The two search engine variants also
represent the only two that filter SERPs for users in mainland China.
Note also that the third column in Table 5 represents Google_CN.
While it is clustered with Baidu_CN and Yahoo_CN, it has more
strong ties with the top 100 websites, suggesting it has less
divergent results.
The findings seems to suggest that users from mainland China, if
using only Baidu_CN and Yahoo_CN, will have a substantial
number of otherwise highly visible websites overlooked or even
missing from their daily search experiences. These include
websites such as YouTube and Facebook that have been reported
being blocked by mainland China. They also include the websites
of government and education institutions in Taiwan and Hong
Kong: gov.tw gov.hk, edu.tw and edu.hk. In other words, the
Table 6 Results for Google_CN
Ranking
1 47.65% 25.08% 36.44% 37.28% wikipedia.org 28.98% 27.89% mbalib.com 27.99%
2 wikipedia.org 25.36% wikipedia.org 12.94% wikipedia.org 15.33% wikipedia.org 24.13% 26.82% wikipedia.org 25.14% 16.67%
3 hudong.com 8.74% sina.com.cn 12.06% sina.com.cn 9.46% hudong.com 11.00% hudong.com 7.66% hudong.com 9.63% fortunechina.com 13.65%
4 sina.com.cn 2.58% 6.67% douban.com 5.00% mbalib.com 3.55% sina.com.cn 7.10% sina.com.cn 7.17% wikipedia.org 8.74%
5 ifeng.com 2.03% 163.com 6.01% 4.45% sina.com.cn 3.18% xinhuanet.com 4.81% sohu.com 3.59% 4.09%
6 artxun.com 1.33% sohu.com 5.86% hudong.com 3.60% people.com.cn 2.66% people.com.cn 4.03% people.com.cn 3.34% qkankan.com 3.79%
7 soso.com 1.30% hudong.com 4.27% sohu.com 3.33% 2.64% 3.39% xinhuanet.com 2.95% sina.com.cn 3.62%
8 zdic.net 1.13% youku.com 4.26% youku.com 3.14% hc360.com 1.61% ifeng.com 3.30% youku.com 2.74% ifeng.com 3.59%
9 tiexue.net 1.07% xinhuanet.com 3.29% 163.com 3.09% sohu.com 1.50% 163.com 2.93% 2.45% hudong.com 3.41%
10 cncn.com 1.06% ifeng.com 2.78% mtime.com 2.14% 163.com 1.46% sohu.com 2.30% iciba.com 1.94% gold678.com 3.20%
11 xinhuanet.com 1.04% douban.com 2.47% youtube.com 1.77% hexun.com 1.44% weibo.com 1.71% 163.com 1.90% 163.com 2.43%
12 artx.cn 1.03% people.com.cn 2.31% 1ting.com 1.63% ifeng.com 1.43% youtube.com 1.55% ifeng.com 1.72% ciipp.com 1.29%
13 people.com.cn 0.96% hexun.com 1.85% weibo.com 1.58% studa.net 1.26% boxun.com 1.25% 360doc.com 1.50% sohu.com 1.15%
14 youku.com 0.84% huanqiu.com 1.59% m1905.com 1.56% 3edu.net 1.05% hexun.com 0.78% youtube.com 1.45% egouz.com 1.12%
15 163.com 0.83% youtube.com 1.57% iqiyi.com 1.50% 39.net 1.04% renren.com 0.62% sogou.com 1.25% bitauto.com 1.11%
16 sohu.com 0.73% yahoo.com 1.51% sogou.com 1.39% edu.cn 1.02% edu.tw 0.60% tianya.cn 1.12% people.com.cn 0.96%
17 0.63% gov.tw 1.45% tudou.com 1.35% jrj.com.cn 1.00% china.com.cn 0.58% laonanren.com 1.03% zol.com.cn 0.93%
18 edu.tw 0.60% iqiyi.com 1.43% ifeng.com 1.27% chinaacc.com 0.97% libertytimes.com.tw 0.55% hexun.com 0.89% hexun.com 0.88%
19 edu.cn 0.54% weibo.com 1.32% xiami.com 1.07% xinhuanet.com 0.95% twitter.com 0.53% soso.com 0.81% yup.cn 0.72%
20 5156edu.com 0.54% tudou.com 1.27% pptv.com 0.91% youku.com 0.83% yahoo.com 0.52% cfdd.org.cn 0.76% google.cn 0.66%
Fortune500
The Cambridge
Encyclopedia of China
Top 10 Search Terms
(Google and Baidu)
Best Film/Popular Music
(China, Hong Kong,
Taiwan)
Modern Concepts (shared
with modern Japanese)
Notable People Potentially sensitive terms
Table 7 Results for Baidu_CN
Ranking
1 75.74% 64.17% 73.28% 81.56% 57.53% 69.54% 61.90%
2 wikipedia.org 6.20% youku.com 4.79% youku.com 6.66% wikipedia.org 2.41% wikipedia.org 7.48% wikipedia.org 5.30% mbalib.com 7.62%
3 hudong.com 1.98% sina.com.cn 4.59% iqiyi.com 2.57% sina.com.cn 2.16% 6.12% sina.com.cn 3.38% fortunechina.com 7.13%
4 sina.com.cn 1.94% 4.13% douban.com 2.30% 2.05% sina.com.cn 5.00% 3.23% sina.com.cn 3.20%
5 youku.com 1.86% sohu.com 3.05% tudou.com 1.91% youku.com 1.59% ifeng.com 2.82% youku.com 2.17% ifeng.com 2.27%
6 soso.com 1.64% iqiyi.com 2.73% sina.com.cn 1.65% xinhuanet.com 1.14% people.com.cn 2.52% sohu.com 1.73% fx678.com 1.91%
7 1.61% 163.com 2.32% weibo.com 1.61% www.gov.cn 1.10% sohu.com 2.46% xinhuanet.com 1.68% zol.com.cn 1.73%
8 ifeng.com 1.18% tudou.com 1.91% 1.55% edu.cn 0.89% xinhuanet.com 2.31% 163.com 1.50% wikipedia.org 1.73%
9 douban.com 1.13% xinhuanet.com 1.53% xunlei.com 1.48% ifeng.com 0.80% 163.com 1.84% tianya.cn 1.47% 1.60%
10 tiexue.net 0.89% douban.com 1.28% mtime.com 1.07% sohu.com 0.78% soso.com 1.68% people.com.cn 1.40% bitauto.com 1.53%
11 weather.com.cn 0.88% ifeng.com 1.24% letv.com 0.78% people.com.cn 0.74% weibo.com 1.52% hexun.com 1.27% 163.com 1.38%
12 edu.cn 0.61% renren.com 1.19% m1905.com 0.78% douban.com 0.60% uname.cn 1.40% soso.com 1.17% qkankan.com 1.37%
13 xilu.com 0.59% letv.com 1.14% 163.com 0.73% 163.com 0.59% renren.com 1.36% douban.com 1.04% gongchang.com 1.05%
14 xinhuanet.com 0.58% weibo.com 0.97% verycd.com 0.68% rayli.com.cn 0.59% kaixin001.com 1.32% tudou.com 0.89% ticarefree.cn 1.04%
15 163.com 0.58% wikipedia.org 0.97% sohu.com 0.55% hao123.com 0.57% douban.com 0.97% bitauto.com 0.88% soso.com 0.86%
16 guoxue.com 0.57% zol.com.cn 0.93% 1ting.com 0.53% jrj.com.cn 0.50% youku.com 0.85% ifeng.com 0.73% yingjiesheng.com 0.83%
17 360buy.com 0.52% xunlei.com 0.80% pptv.com 0.50% huanqiu.com 0.49% 360buy.com 0.78% sensagent.com 0.70% autohome.com.cn 0.74%
18 qidian.com 0.51% taobao.com 0.80% ku6.com 0.48% iqiyi.com 0.48% www.gov.cn 0.73% hudong.com 0.66% xgo.com.cn 0.73%
19 tudou.com 0.51% huanqiu.com 0.74% yinyuetai.com 0.48% bankcomm.com 0.47% edu.cn 0.73% yangbihu.com 0.65% eastmoney.com 0.70%
20 sohu.com 0.50% 4399.com 0.71% wikipedia.org 0.42% chinaacc.com 0.46% hudong.com 0.58% tiexue.net 0.61% people.com.cn 0.68%
Fortune500
The Cambridge
Encyclopedia of China
Top 10 Search Terms
(Google and Baidu)
Best Film/Popular Music
(China, Hong Kong,
Taiwan)
Modern Concepts (shared
with modern Japanese)
Notable People Potentially sensitive terms
9. 9
SERPs of the three mainland Chinese variants seem to diverge from
these websites. In contrast, the websites of government and
education institutions in mainland China, gov.cn and edu.cn, are
still relatively visible for almost all other search engine variants
except for the by-default-local Yahoo_TW and Yahoo_HK. Thus,
the patterns of merging and diverging seem to reflect the cultural
political complications of Chinese-language internet. While the
offline boundary between Hong Kong and Taiwan seems to be
overcome, that between mainland China and Hong Kong seems to
be reinforced. Although the SERP data may not reflect perfectly
what users actually read and click, it nonetheless indicates a general
probabilistic tendency substantiated by industry data.
6. CONCLUSION
The findings, visualized and analysed using network analysis
techniques, clearly indicate a strong localization effects on the
gatekeeping function of search engines, based on data covering
over 97% of the search engine market for four Chinese-speaking
regions. The findings also show major user-generated
encyclopedias such as Baidu Baike and Chinese Wikipedia do
dominate the SERPs with high rankings and visibility scores.
Because of the geo-linguistic factors coincide with different
cultural political situations of these Chinese-speaking regions,
different localization variants produce divergent outcomes of high-
ranking encyclopedia and other websites, thereby indicating strong
effects of ¡°network gatekeeping¡± by search engines in exercising
gatekeeping bases of ¡°display¡± and ¡°localization¡±(Barzilai-Nahon,
2008).
In addition, by examining the overall patterns of SERPs, I have
demonstrated the merging and diverging effects contributed by the
factors of search engine providers and regional and language
settings. Different combinations of such provider and geo-linguistic
information lead to different ¡°search engine variants¡±. Nine major
search engine variants, covering four regions with Chinese-
speaking majority population, are identified for the Chinese-
language internet. For a selected set of search queries covering
major Chinese cultural and political topics, I have found that the
SERPs converge on a specific type of websites (i.e. user-generated
encyclopedias) and that some search engine variants converge more
on Baidu Baike while other on Chinese Wikipedia. The merging
and diverging patterns are further analysed by both network
visualization and network analysis (blockmodeling analysis of two-
mode networks). Different patterns indicate that both
¡°nationalization¡± of a specific kind (i.e. mainland China) and
¡°trans-nationalization¡± (i.e. Hong Kong and Taiwan) can be
achieved by different gatekeeping options offered by various search
engine variants.
The results show that the SERPs are more likely to converge based
on similar geo-linguistic preferences. For example, the SERPs
diverge the most when users choose different Chinese characters
(i.e. simplified Chinese versus traditional Chinese). It is then
particularly intriguing that all Hong Kong variant results converge
more with Taiwanese variant ones and much less so with mainland
Chinese variants, while Hong Kong is much closer to mainland
China geographically, politically and administratively. In addition,
Chinese Wikipedia is much more visible in these regions than in
mainland China. Though the findings here cannot further
breakdown the geo-linguistic factors from cultural political ones,
the converging and diverging patterns alone are important findings
for Chinese-internet research and Wikipedia research.
There are of course obvious limitations for the findings presented
above. First, the selection of search query, while significant larger
than previous social scientific research on Chinese-language search
engines(Jiang & Akhtar, 2011), is still limited. Second, due to
limitation of space, this paper has not yet fully unpacked the
different findings for different categories of search queries. Third,
only standard Mandarin Chinese terms are used for this research,
overlooking other possibilities of written Cantonese queries (Chau,
Fang, & Yang, 2007). Forth but not last, only the default setting for
each localized search engine is analysed.
While the dataset presented may be limited in the scope of selected
search queries, time and search engine variants, I have
demonstrated the usefulness and viability of examining the merging
and diverging patterns because of the search engine variants, each
of which correspond to a segment of search engine market. For
instance, it can help online linguistics research by analysing
different SERP outcome for regions that use a shared writing
system but with regional variants, such as the difference between
Egyptian Arabic and Maghrebi Arabic. For another example, these
geo-linguistic factors can be said to constitute one of the most
important online ¡°situations¡± for online media, as defined by
medium theorists in the tradition of media ecology (Meyrowitz,
1986, 1994), because these factors set the patterns of access.
According to a statistical report by the Data Center of China
Internet, During the first half year of 2010, the content produced by
amateur Chinese Internet users have surpassed that produced by
professional websites (Liao, 2013b; Qiang, 2010). Thus user-
generated content by Chinese Internet users are expected to have
influenced user-generated encyclopedias directly and SERP
indirectly. While this study has not yet addressed the relationship
among search engines, user-generated content and user-generated
encyclopedias, the findings here seems to suggest similar
geographic and linguistic dynamics. The clear outcome of
¡°network gatekeeping¡±, identified by Chinese search engine
variants and their respective preferred encyclopedias, may point to
a larger online context for Chinese Internet users across regions.
For future research, it will be useful to examine how geographic
and linguistic factors may influence the network gatekeeping
processes inside user-generated encyclopedias (Liao, 2009). It is
likely that they also exercise the gatekeeping bases of ¡°display¡± and
¡°localization¡± as search engines do.
The overall method can be systematically extended for other
contexts. Various search engine variants can be chosen for research
for almost all the other language in the world, including languages
with transnational adoption such as Arabic, Hindu, Tamil, English,
Spanish, Portuguese, etc. Researchers can thus further interpret the
merging and diverging SERP outcome for research questions that
are relevant for global, transnational or inter-cultural
communications on one hand, and another set of questions for
human-computer interaction and information system on the other.
Also, the focus on examining geo-linguistic factors as important
variables for understanding search engines can contribute to the
development of geo-linguistic analysis of the Web (Liao & Petzold,
2011; Petzold & Liao, 2011). It can also be adopted for market and
industry applications when geo-linguistic identifiers are central
(DePalma, 2002; Dunne, 2006) .
In conclusion, the proposed method has the potentials for a wider
range of market and academic applications. The theoretical
implication may be extended to other websites or information
systems that produce or curate different outcome based on
geographic and linguistic preferences (or configurations) of users.
It highlights the role of geo-linguistic parameters as media ¡°access
codes¡±, or set patterns of access to information as articulated by
medium theorists for TV research (Meyrowitz, 1986, 1994), or the
12. 12
Morris, M., & Ogan, C. (2002). The Internet as Mass Medium. In
D. McQuail (Ed.), McQuail¡¯s reader in mass communication
theory (pp. 134¨C145). London: SAGE.
Nguyen, C. (2011, March). Search Engine Market share by country.
Chandler Nguyen Digital Marketing Blog. Retrieved December
1, 2011, from http://www.chandlernguyen.com/2011/03/search-
engine-market-share-by-country-mar-2011.html
Nielsen Online. (2008). Wikipedia U.S. Web Traffic Grows 8,000
Percent In Five Years, Driven By Search. New York: Nielsen
Online. Retrieved from
http://news.softpedia.com/news/Wikipedia-Traffic-Mostly-
from-Google-85703.shtml
Petzold, T., & Liao, H.-T. (2011). Geo-linguistic analysis of the
World Wide Web: The use of cartograms and network analysis
to understand linguistic development in Wikipedia. In D. Araya,
Y. Breindl, & T. J. Houghton (Eds.), Nexus: New Intersections
in Internet Research (pp. 55¨C75). New York: Peter Lang.
Petzold, T., Liao, H.-T., Hartley, J., & Potts, J. (2012). A world map
of knowledge in the making: Wikipedia¡¯s inter-language linkage
as a dependency explorer of global knowledge accumulation.
Leonardo: Art, Science and Technology, 45(3), 284¨C284.
doi:10.1162/LEON_a_00376
PricewaterhouseCoopers. (2011). IAB Internet Advertising
Revenue Report. New York; DC: The Interactive Advertising
Bureau. Retrieved from http://www.iab.net/AdRevenueReport
Qiang, X. (2010, July 23). User-generated content online now
50.7% of total. China Daily. Beijing. Retrieved from
http://www.chinadaily.com.cn/business/2010-
07/23/content_11042851.htm
Rogers, R., & Sendijarevic, E. (2012). Neutral or National Point of
View? A Comparison of Srebrenica articles across Wikipedia¡¯s
language versions. In Wikipedia Academy: Research and Free
Knowledge (#wpac2012). Berlin. Retrieved from
http://wikipedia-
academy.de/2012/w/images/8/89/3_Paper_Richard_Rogers_E
mina_Sendijarevic.pdf
Russell, J. (2011). Why Yahoo! ¨Cnot Google¨C rules Taiwan¡¯s
webspace. Asian Correspondent. Retrieved December 1, 2011,
from http://asiancorrespondent.com/55695/focus-on-taiwan-
where-yahoo-not-google-rules-the-countrys-webspace/
Segev, E. (2008). Search Engines and Power: A Politics of Online
(Mis-) Information. text. Retrieved November 19, 2011, from
http://www.webology.org/2008/v5n2/a54.html
SEMPO. (2011). SEMPO State of Search Marketing Report 2011.
SEMPO Institute. Retrieved from
http://econsultancy.com/uk/reports/sempo-state-of-search
Silverwood-Cope, S. (2012, February 8). Wikipedia: Page one of
Google UK for 99% of searches. Intelligent Positioning Blog.
Retrieved from
http://www.intelligentpositioning.com/blog/2012/02/wikipedia
-page-one-of-google-uk-for-99-of-searches/
Slingshot SEO. (2011). Google & Bing Click-Through Rates
(White paper). Retrieved from
http://www.slingshotseo.com/resources/white-papers/google-
ctr-study/
Spindler, S. (2010). Online Marketing: How to Increase
International Sales with Search Engine Optimisation. GRIN
Verlag.
StatCounter. (2011). Top 5 Search Engines in China/Hong
Kong/Singapore/Taiwan from Nov 2010 to Nov 2011.
StatCounter Global Stats. Retrieved December 1, 2011, from
http://gs.statcounter.com/#search_engine-CN-monthly-
201011-201111
Sunstein, C. R. (2002). Fragmentation and Cybercascades. In
Republic.Com. Princeton University Press.
The Cambridge encyclopedia of China. (1991) (2nd ed.).
Cambridge [England]?; New York: Cambridge University Press.
University, J. G. H. L. S. P. of L. H., & School, T. W. P. of L. C.
L. (2006). Who Controls the Internet??: Illusions of a Borderless
World: Illusions of a Borderless World. Oxford University
Press.
Varian, H. R. (2007). The Economics of Internet Search. Presented
at the Angelo Costa lecture, Rome. Retrieved from
http://people.ischool.berkeley.edu/~hal/Papers/2007/costa-
lecture.pdf
Vaughan, L., & Thelwall, M. (2004). Search engine coverage bias:
evidence and possible causes. Information Processing &
Management, 40(4), 693¨C707.
Vaughan, L., & Zhang, Y. (2007). Equal Representation by Search
Engines? A Comparison of Websites across Countries and
Domains. Journal of Computer-Mediated Communication,
12(3). Retrieved from
http://jcmc.indiana.edu/vol12/issue3/vaughan.html
Warncke-Wang, M., Uduwage, A., Dong, Z., & Riedl, J. (2012). In
Search of the Ur-Wikipedia: Universality, Similarity, and
Translation in the Wikipedia Inter-language Link Network.
Retrieved from
http://www.grouplens.org/system/files/p3wikisym2012.pdf
Yang, Y. (2011, February 25). China¡¯s ¡°Wikipedia¡± Submits
Complaint about Baidu. Economic Observer News, 508, 28.
Young, R. D. (2011, August 10). Top Google Ranking Captures
18.2% of Clicks. Search Engine Watch (#SEW). Retrieved
December 2, 2011, from
http://searchenginewatch.com/article/2100616/Top-Google-
Ranking-Captures-18.2-of-Clicks-Study
Zhao, S., & Baldauf, R. B. J. (2007). Planning Chinese Characters:
Reaction, Evolution or Revolution? Springer.