The document summarizes research analyzing the topology of the JavaScript package ecosystem using topology data analysis (TDA). Key findings from the TDA include:
1) The number of package dependencies is a strong feature in the topology that separates packages.
2) Packages more likely to be used within the ecosystem are located separately from those meant for external application usage.
3) Comparison with archetypal analysis identified archetypes of packages based on common keywords and example packages for each archetype.
1 of 20
Download to read offline
More Related Content
Extracting Insights from the Topology of the JavaScript Package Ecosystem
1. Extracting Insights from the
Topology of the JavaScript
Package Ecosystem
Authors : Nuttapon Lertwittayatrai+ Raula Gaikovina Kula*
Saya Onoue* Hideaki Hata*
Arnon Rungsawang+ Pattara Leelaprute+
Kenichi Matsumoto*
+ Kasetsart University, Thailand
* Nara Institute of Science and Technology, Japan
2. Software Ecosystem ?
Lungu* : a collection of software systems, which are
developed and co-evolve in the same environment.
2
*Lungu, Mircea (2009). Reverse Engineering Software Ecosystems (Ph.D.). University of Lugano.
5. Complex Characteristics of
Packages- as vectors
5
name: npm
version: 5.6.0
license: Artistic-2.0
keywords: install,modules,package manager,package.json
name: browserify
version: 14.5.0
license: MIT
keywords: browser,require,commonjs,commonj-esque,bundle,npm,javascript
name: express
version: 4.16.2
license: MIT
keywords: express,framework,sinatra,web,rest,restful,router,app,api
6. Problem: How to analyze high
dimensions of Package
Characteristics?
6
http://setosa.io/ev/principal-component-analysis/
2 vectors 3 vectors 4+ vectors
name
version
name
version
license
name
version
license
keywords
author
...
7. Shape of the Ecosystem
http://slideplayer.com/slide/238747/
7
8. Topology is a major area of
mathematics -- spatial properties
with deformations of objects.
TDA is the result of a concerted
effort to adapt topological methods
to various applied problems, one of
which is the study of large and
high dimensional datasets.
8
Topology Data Analysis (TDA)
Shape of Data
P. Y. Lum, G. Singh, A. Lehman, T. Ishkanov, M. Vejdemo-Johansson, M. Alagappan, J. Carlsson,
and G. Carlsson, Extracting insights from the shape of complex data using topology. from Nature
9. NBA TDA Players example
9P. Y. Lum, G. Singh, A. Lehman, T. Ishkanov, M. Vejdemo-Johansson, M. Alagappan, J. Carlsson, and G. Carlsson,
Extracting insights from the shape of complex data using topology. from Nature
10. Shape of the ecosystem
10https://firebearstudio.com/blog/the-most-popular-packages-for-bower-and-npm.html
To uncover insights and
extract patterns of existing
packages.
Datasets gathered from
software ecosystems are
vastly high-dimensional,
noisy and are generally
challenging ...
11. TDA to understand shape of
Application vs. Npm Packages
A set of keywords that were likely to be related to either
applications (GitHub strong) or the npm package
ecosystem (npm strong).
11
GitHub strong npm strong
gruntplugin util
gulpplugin array
express buffer
react string
authenticate file
E. Wittern, P. Suter, and S. Rajagopalan, A look at the dynamics of the JavaScript package ecosystem, in
Proceedings of the 13th International Workshop on Mining Software Repositories - MSR 16. New York, New
York, USA: ACM Press, 2016, pp. 351361.
12. Extract Feature Vector (1/2)
12
f1 Author: Name of
person who build this
package.
f2 Author Domain: Email
domain of person who
build this package.
f3 License: License tell
people know what
organization that publish
the package how they are
permitted to use it.
package.json
13. Extract Feature Vector (2/2)
13
f4 Tagged Keywords: An
array of strings that helps
people discover your package
as its listed in npm search.
f5 Version Released: Version
form an identifier that is
assumed to be completely
unique.
f6 Number of Dependencies:
The number of mapped
package dependencies to a
version range.package.json
14. TDA High Dimension Mapping
Dataset
14
f1 (Author)
f2 (Author Domain)
f3 (License)
Knotter
(TDA)
Vector Space
Model (VSM)
f4 (Keywords)
f5 (Version)
f6 (No.dependencies)
word2vec
estimate version
count dependencies
package.json
151,000
packages
https://github.com/rosinality/knotter
17. Results
Packages that are more likely to be used within ecosystem
are located separately from packages meant for application
usage outside the ecosystem
17
18. Comparison with Archetypal
Analysis
18
Archetype 1 (A1) has packages that contains keywords, such as web,
plugin, test, http, express, node, api and server.
Archetype 2 (A2) has packages that contains keywords like html,
gulpplugin, css, javascript and gulp.
Archetype 3 (A3) has lower packages compared to the other two
archetypes.