際際滷

際際滷Share a Scribd company logo
Extracting Insights from the
Topology of the JavaScript
Package Ecosystem
Authors : Nuttapon Lertwittayatrai+ Raula Gaikovina Kula*
Saya Onoue* Hideaki Hata*
Arnon Rungsawang+ Pattara Leelaprute+
Kenichi Matsumoto*
+ Kasetsart University, Thailand
* Nara Institute of Science and Technology, Japan
Software Ecosystem ?
Lungu* : a collection of software systems, which are
developed and co-evolve in the same environment.
2
*Lungu, Mircea (2009). Reverse Engineering Software Ecosystems (Ph.D.). University of Lugano.
https://www.npmjs.com/ accessed 2017-12-02
Its Big..
Analysis of the ecosystem...
Complex Characteristics of
Packages- as vectors
5
name: npm
version: 5.6.0
license: Artistic-2.0
keywords: install,modules,package manager,package.json
name: browserify
version: 14.5.0
license: MIT
keywords: browser,require,commonjs,commonj-esque,bundle,npm,javascript
name: express
version: 4.16.2
license: MIT
keywords: express,framework,sinatra,web,rest,restful,router,app,api
Problem: How to analyze high
dimensions of Package
Characteristics?
6
http://setosa.io/ev/principal-component-analysis/
2 vectors 3 vectors 4+ vectors
 name
 version
 name
 version
 license
 name
 version
 license
 keywords
 author
 ...
Shape of the Ecosystem
http://slideplayer.com/slide/238747/
7
 Topology is a major area of
mathematics -- spatial properties
with deformations of objects.
 TDA is the result of a concerted
effort to adapt topological methods
to various applied problems, one of
which is the study of large and
high dimensional datasets.
8
Topology Data Analysis (TDA)
Shape of Data
P. Y. Lum, G. Singh, A. Lehman, T. Ishkanov, M. Vejdemo-Johansson, M. Alagappan, J. Carlsson,
and G. Carlsson, Extracting insights from the shape of complex data using topology. from Nature
NBA TDA Players example
9P. Y. Lum, G. Singh, A. Lehman, T. Ishkanov, M. Vejdemo-Johansson, M. Alagappan, J. Carlsson, and G. Carlsson,
Extracting insights from the shape of complex data using topology. from Nature
Shape of the ecosystem
10https://firebearstudio.com/blog/the-most-popular-packages-for-bower-and-npm.html
 To uncover insights and
extract patterns of existing
packages.
 Datasets gathered from
software ecosystems are
vastly high-dimensional,
noisy and are generally
challenging ...
TDA to understand shape of
Application vs. Npm Packages
 A set of keywords that were likely to be related to either
applications (GitHub strong) or the npm package
ecosystem (npm strong).
11
GitHub strong npm strong
gruntplugin util
gulpplugin array
express buffer
react string
authenticate file
E. Wittern, P. Suter, and S. Rajagopalan, A look at the dynamics of the JavaScript package ecosystem, in
Proceedings of the 13th International Workshop on Mining Software Repositories - MSR 16. New York, New
York, USA: ACM Press, 2016, pp. 351361.
Extract Feature Vector (1/2)
12
 f1 Author: Name of
person who build this
package.
 f2 Author Domain: Email
domain of person who
build this package.
 f3 License: License tell
people know what
organization that publish
the package how they are
permitted to use it.
package.json
Extract Feature Vector (2/2)
13
 f4 Tagged Keywords: An
array of strings that helps
people discover your package
as its listed in npm search.
 f5 Version Released: Version
form an identifier that is
assumed to be completely
unique.
 f6 Number of Dependencies:
The number of mapped
package dependencies to a
version range.package.json
TDA High Dimension Mapping
Dataset
14
f1 (Author)
f2 (Author Domain)
f3 (License)
Knotter
(TDA)
Vector Space
Model (VSM)
f4 (Keywords)
f5 (Version)
f6 (No.dependencies)
word2vec
estimate version
count dependencies
package.json
151,000
packages
https://github.com/rosinality/knotter
Results
15
Results
The number of package dependencies is a strong
feature in the topology
16
Results
Packages that are more likely to be used within ecosystem
are located separately from packages meant for application
usage outside the ecosystem
17
Comparison with Archetypal
Analysis
18
 Archetype 1 (A1) has packages that contains keywords, such as web,
plugin, test, http, express, node, api and server.
 Archetype 2 (A2) has packages that contains keywords like html,
gulpplugin, css, javascript and gulp.
 Archetype 3 (A3) has lower packages compared to the other two
archetypes.
Comparison with Archetypal
Analysis
19
Archet
ype
Identified Packages
A1 tar-parse, turtle-run, marked-sanitized, haversort, bmxplayjs
A2
statsd-influxdb-backend, ardeidae, demo-blog-system, git-ssb-
web, social-media-resolver
A3
stream-viz, programify, polyclay-couch, meshblu-core-task-check-
update-device-is-valid, apidoc-almond
20

More Related Content

Extracting Insights from the Topology of the JavaScript Package Ecosystem

  • 1. Extracting Insights from the Topology of the JavaScript Package Ecosystem Authors : Nuttapon Lertwittayatrai+ Raula Gaikovina Kula* Saya Onoue* Hideaki Hata* Arnon Rungsawang+ Pattara Leelaprute+ Kenichi Matsumoto* + Kasetsart University, Thailand * Nara Institute of Science and Technology, Japan
  • 2. Software Ecosystem ? Lungu* : a collection of software systems, which are developed and co-evolve in the same environment. 2 *Lungu, Mircea (2009). Reverse Engineering Software Ecosystems (Ph.D.). University of Lugano.
  • 4. Analysis of the ecosystem...
  • 5. Complex Characteristics of Packages- as vectors 5 name: npm version: 5.6.0 license: Artistic-2.0 keywords: install,modules,package manager,package.json name: browserify version: 14.5.0 license: MIT keywords: browser,require,commonjs,commonj-esque,bundle,npm,javascript name: express version: 4.16.2 license: MIT keywords: express,framework,sinatra,web,rest,restful,router,app,api
  • 6. Problem: How to analyze high dimensions of Package Characteristics? 6 http://setosa.io/ev/principal-component-analysis/ 2 vectors 3 vectors 4+ vectors name version name version license name version license keywords author ...
  • 7. Shape of the Ecosystem http://slideplayer.com/slide/238747/ 7
  • 8. Topology is a major area of mathematics -- spatial properties with deformations of objects. TDA is the result of a concerted effort to adapt topological methods to various applied problems, one of which is the study of large and high dimensional datasets. 8 Topology Data Analysis (TDA) Shape of Data P. Y. Lum, G. Singh, A. Lehman, T. Ishkanov, M. Vejdemo-Johansson, M. Alagappan, J. Carlsson, and G. Carlsson, Extracting insights from the shape of complex data using topology. from Nature
  • 9. NBA TDA Players example 9P. Y. Lum, G. Singh, A. Lehman, T. Ishkanov, M. Vejdemo-Johansson, M. Alagappan, J. Carlsson, and G. Carlsson, Extracting insights from the shape of complex data using topology. from Nature
  • 10. Shape of the ecosystem 10https://firebearstudio.com/blog/the-most-popular-packages-for-bower-and-npm.html To uncover insights and extract patterns of existing packages. Datasets gathered from software ecosystems are vastly high-dimensional, noisy and are generally challenging ...
  • 11. TDA to understand shape of Application vs. Npm Packages A set of keywords that were likely to be related to either applications (GitHub strong) or the npm package ecosystem (npm strong). 11 GitHub strong npm strong gruntplugin util gulpplugin array express buffer react string authenticate file E. Wittern, P. Suter, and S. Rajagopalan, A look at the dynamics of the JavaScript package ecosystem, in Proceedings of the 13th International Workshop on Mining Software Repositories - MSR 16. New York, New York, USA: ACM Press, 2016, pp. 351361.
  • 12. Extract Feature Vector (1/2) 12 f1 Author: Name of person who build this package. f2 Author Domain: Email domain of person who build this package. f3 License: License tell people know what organization that publish the package how they are permitted to use it. package.json
  • 13. Extract Feature Vector (2/2) 13 f4 Tagged Keywords: An array of strings that helps people discover your package as its listed in npm search. f5 Version Released: Version form an identifier that is assumed to be completely unique. f6 Number of Dependencies: The number of mapped package dependencies to a version range.package.json
  • 14. TDA High Dimension Mapping Dataset 14 f1 (Author) f2 (Author Domain) f3 (License) Knotter (TDA) Vector Space Model (VSM) f4 (Keywords) f5 (Version) f6 (No.dependencies) word2vec estimate version count dependencies package.json 151,000 packages https://github.com/rosinality/knotter
  • 16. Results The number of package dependencies is a strong feature in the topology 16
  • 17. Results Packages that are more likely to be used within ecosystem are located separately from packages meant for application usage outside the ecosystem 17
  • 18. Comparison with Archetypal Analysis 18 Archetype 1 (A1) has packages that contains keywords, such as web, plugin, test, http, express, node, api and server. Archetype 2 (A2) has packages that contains keywords like html, gulpplugin, css, javascript and gulp. Archetype 3 (A3) has lower packages compared to the other two archetypes.
  • 19. Comparison with Archetypal Analysis 19 Archet ype Identified Packages A1 tar-parse, turtle-run, marked-sanitized, haversort, bmxplayjs A2 statsd-influxdb-backend, ardeidae, demo-blog-system, git-ssb- web, social-media-resolver A3 stream-viz, programify, polyclay-couch, meshblu-core-task-check- update-device-is-valid, apidoc-almond
  • 20. 20