SQLGitHub features a SQL-like syntax that allows you to:
Query information about an organization as a whole.
You may also think of it as a better, enhanced frontend layer built on top of GitHub's RESTful API.
1 of 20
Download to read offline
More Related Content
SQLGitHub - Access GitHub API with SQL-like syntaxes
2. Servo
A new browser engine written from scratch in Rust by Mozilla
Original Project: Write an Android frontend for servo
Issues, Issues and issues
Zero knowledge about mobile development and Rust
Outdated documentation
Very poorly supported on Android
Tried approaching in various ways, but to no avail
Made build and packaging guide and reported issues
3. SQLGitHub Motivation (I)
Who is stealing all my easy bugs/issues?
There are very limited tools for managing GitHub organizations
Open-source organizations are usually understaffed
The Servo project on GitHub for example,
contains 129 repositories
managed by practically 1 person.
4. SQLGitHub Background
Organization: Organizations are shared accounts where businesses
and open-source projects can collaborate across many projects at
once.
Repository: A repository contains all of the project files, and stores
each file's revision history.
Commit: An individual change to a set of files.
Issue: Suggested improvements, tasks or questions
Pull Request: Proposed change
to a repository.
Repository
Organization
Repository Repository
Commits
Issues
Pull Requests
Commits
Issues
Pull Requests
Commits
Issues
Pull Requests
https://help.github.com/articles/github-glossary/
5. SQLGitHub Motivation (II)
Common questions/problems an organization admin include:
Obtain certain metrics of the organization in machine-friendly format for
post-processing (eg. KPI report)
Get the current list of projects hosted on GitHub
List of the most popular repositories in the organization
Get the list of issues closed (resolved) for the past 7 days
What are the critical issues that are still left open?
Who are the top contributors of the past month?
endless possible questions
6. Abstract
SQLGitHub features a SQL-like syntax that allows you to:
Query information about an organization as a whole.
You may also think of it as a better, enhanced frontend layer built
on top of GitHubs RESTful API
8. Introduction Use Case (I)
Get name and description from all the repos in apple.
select name, description from apple.repos
9. Introduction Use Case (II)
Get last-updated time and title of the issues closed in the past week (7
days) in servo listed in descending order of last-updated time.
select updated_at, title from servo.issues.closed.7 order by updated_at desc
10. Introduction Use Case (III)
Get top 10 most-starred repositories in servo.
select concat(concat("(", stargazers_count, ") ", name), ": ", description) from
servo.repos order by stargazers_count desc, name limit 10
11. Introduction Use Case (IV)
Get top 10 contributors in servo for the past month (30 days) based
on number of commits.
select login, count(login) from servo.commits.30 group by login order by
count(login) desc, login limit 10
12. Introduction Technology Stack
Python
re & regex, regular expression libraries
PyGithub (patched), an unofficial client library for GitHub API
prompt_toolkit, a library for building prompts
pygments, a library for syntax highlighting
13. Introduction (Simplified) Flow
Fetch data (from)
Filter by where conditions
Evaluate partial exprs
Group by group exprs
Order by order exprs
Evaluate select exprs
Fetch data with required fields from GitHub API
Evaluate where conditions and filter fetched data
Evaluate group exprs and other field exprs
Generate table groups by values of group exprs
Sort within and between tables
Evaluate select exprs
Filter by having conditions Evaluate having conditions and filter tables
15. Introduction Challenges (I)
Algorithm of parsing is almost identical to that of expression
evaluation waste of time
Lazy Parsing: Only parse clauses (eg. select, from, where) and comma-
separated fields
Comma-separated fields, strings and escape characters
Evaluate this: concat("[)"Stars"(: ", stargazers_count)
concat("[)"Stars"(: ", stargazers_count)
concat("[)"Stars"(: ", stargazers_count)
concat("[)"Stars"(: ", stargazers_count)
16. Introduction Challenges (II)
Extracting all relevant fields from expressions to fetch at once
select concat("[)"-> avg(stargazers_count)"(: ", stargazers_count -
avg(stargazers_count), "] ", name) from apple.repos where description
like "%library%" order by id
Algorithm: for each expression,
Remove all literal strings. Use r""(?:[^"]|.)*"" to match.
Find all possible tokens with r"([a-zA-Z_]+)(?:[^(a-zA-Z_]|$)".
For each token, check if its a predefined token (ie. part of SQL).
17. Introduction Challenges (III)
Expression Evaluation is really complicated
Regular (eg. concat, floor) and Aggregate functions (eg. max, min)
Have to evaluate an entire table at once
Nested functions (eg. sum(avg(field_a) + avg(field_b)))
Use recursive regex patterns to extract tokens r((?:(?>[^()]+|(?R))*))
Assign special precedence and insert extra logic in place
Operator Precedence
Modified 2-stack evaluation approach +
Finite State Machine + One-token Lookahead
18. Introduction Challenges (IV)
Pythons built-in sort is not customizable:
sorted(iterable, *, key=None, reverse=False)
order by requires sorting with multiple keys each with potentially
different reverse:
order by field_a desc, field_b asc, field_c, desc
Wrote custom sort that integrates better with the workflow
19. Future Directions
Improve SQL, MySQL compatibility
Extend to end users not just organizations
Migrate to the new GraphQL backend (GitHub API v4)
Integrate SQLGitHub directly on the server end (better efficiency
and perhaps better security!)
20. Acknowledgements
We would like to thank:
Shing Lyu, former software engineer at Mozilla Taiwan for the mentorship
Irvin Chen, Liaison of MozTW (Mozilla Taiwan Community) for
coordinating the program
Prof. Cheng-Chung Lin for organizing the program