際際滷

際際滷Share a Scribd company logo
Create own Excel with
JavaScript
Viktor Turskyi
CEO at WebbyLab
2019
Viktor Turskyi
 CEO and principal architect at
WebbyLab
 Open source developer
 More than 15 years of experience
 Delivered more than 60 projects of
different scale
 Did projects for 5 companies from
Fortune 500 list
Business task
Business logic is in Excel and you need to code
it in your app and run in browser and mobile.
This business logic is complex mathematical
models.
Ok. Send us the file.
Model details
 2 mln cells
 400k formulas
 1 mln Excel functions
 50 sheets
 Computation chains of 20-30k of cells
How to create a high performance excel engine in java script
Demo of original source file
Requirements
 High performance (<2s full recompute)
 Small file size (suitable for work in
browser)
 Offline work in browser
 Work on server
 Offline work on tablets (iOS, Android)
How to create a high performance excel engine in java script
Decision: write own excel in
JavaScript
What we want?
How to create a high performance excel engine in java script
Is JS performant enough for
mathematical computations?
Performance testing (100k times,
large math formula AST)
Any ideas how to do this?
It is like creating a compiler
Components
Extractor
How to read data from XLS file?
 Extract values
 Extract formulas
 Extract sheet names
 Extract cells/ranges names
 Nodejs libraries
 Ruby libraries
 Python libraries
 Perl libraries
 PHP libraries
We tried (everything did not work)
Run Excel as OLE Object
Communicate with Excel via VBA methods
What did work for us?
Preprocessor
What next? Preprocess all data
1. Parse all raw data
2. Parse and normalize formulas
3. Parse and normalize references
4. Optimize size
How to create a high performance excel engine in java script
How to create a high performance excel engine in java script
FormulaParser: What to parse?
1. Operators priority
2. Infix/prefix operators
3. Constants
4. Functions
5. Cell references
6. Range references
7. Named ranges
=IF($F$36 + $AF128 <= 101; SUMPRODUCT(
($S128:OFFSET($S128;$F$36-1;0)) *
($AG$55:OFFSET($AG$55;$F$36-1;0)) *
('Sheet25'!BY84:OFFSET('Sheet25'!BY84;$F$36-1;0) +
'Sheet25'!BY194:OFFSET('Sheet25'!BY194; $F$36-1; 0) ) );
SUMPRODUCT( ($S128:$S$155) *
($AG$55:OFFSET($AG$55;100-$AF128;0)) * ('Sheet25'!BY84:BY$111 +
'Sheet25'!BY194:BY$221) ) )
Formula example
Mistake 1: Trying to write own parser
from scratch
Own parser
1. Complex
2. A lot of time
3. Expensive
90% of the work is the same as
writing a parser for programming
language
Good solution - ANTLR
1. Parser generator based on Grammars
(including JS)
2. Lexer and Parser
3. Emits AST (Abstract Syntax Tree)
4. The fastest and the most powerful
http://www.antlr.org/
Formulas examples
Formula: '=1+2*3'
JS AST: [ '+', 1, [ '*', 2, 3] ]
Formula: '=A1+B1'
JS AST: [+, ['=', 0, 0, 0], ['=', 0, 1, 0] ]
Formula: =SUM(B5:B100, 42)'
JS AST: [ 'SUM', [ 'RANGE', 0, 1, 4, 1, 99 ], 42 ]
Model Runner
Model Runner (simplified)
Components
 LocalRunner(Engine) - works with model,
processes all cells dependencies
 Formula Evaluator - computes one
formula
 Address Parser - parses address in
runtime
 Functions - Excel functions
implementation
How to create a high performance excel engine in java script
Real engine usage
Implementation of EXCEL functions
 One function - one module.
 No side effects
 Use dependency injection
 Test test test test (excel functions often
does not work as documented)
Call example:
SQRT([ 9 ]) returns 3
SUM([2, [5, 6, 7, 9], 1 ]) returns 30
Mistake 2: passing ranges as arrays
SUM([A1, B1:B4, C1]) returns 30
SUM([2, [5, 6, 7, 9], 1 ]) returns 30
Range abstraction is very important
(avoid unnecessary data copying)
SUM( [ [ 21, 22, 23, 31, 32, 33 ] ] );
SUM( [ new ArrayRange([21, 22, 23, 31, 32, 33]) ] );
SUM( [ new ModelRange(model, B2:C4 ) ] );
2+2
How to create a high performance excel engine in java script
How to create a high performance excel engine in java script
=A1:A10+B1:B10
(does not work)
=SUMPRODUCT(A1:A10+B1:B10; C1:C10)
Works
Implementation of ModelRunner
A1=1
A2=A1+1
A3=A1+A2
Cell 1 influences A2 and A3
Cell A2 influences 3
What we want?
We can represent dependencies in
form of directed acyclic graph (DAG)
A1=1;
A2=A1+2;
A3=A1+A2;
Now we can recompute
dependent cells on
changes
Mistake 3: reay on synthetic models
too much
Test model with million cells -
2 seconds for recompute
Real model with million cells
- 1 hour for recompute
Reason: we recompute the
same cells several times
You can sort your dependency graph
with topological sort
Each cell will be calculated only one time
We did it: it worked for test files but did not
work on real models. Why?
Reason: Our graph is more than 10k nodes
deep. We got stack overflow (JS limits call
stack to 10k frames).
What to do
Do not use recursion, traverse graph manually
with own stack.
Real model results:
No toposort - 1 hour
With toposort - 6 seconds
Optimization
Benchmark => tune => benchmark => tune =>
benchmark => tune => benchmark => tune etc
Read a lot about v8 internals
Benchmark => tune => benchmark => tune =>
benchmark => tune => benchmark => tune etc
We did it! Whats next?
OFFSET breaks everything
OFFSET(reference, rows, cols, [height], [width])
=OFFSET(D3,3,-2,1,1) - displays the value in
cell B6
OFFSET breaks everything
=OFFSET(D3, 3, -2) - displays the value in cell
B6
A2 = A1+OFFSET(D3, 3, -2).
Does A2 depend on B6? Do we have any
problem with it?
OFFSET breaks everything
=OFFSET(D3, 3, -2) - displays the value in cell
B6
A2 = A1+OFFSET(D3, RAND(), RAND()).
Which cell does A2 depend on?
Solution: Alternative Runner
implementation
Build graphs
dynamically and cache
them for different
OFFSET args
Demo of how engine works
Conclusion
 Dependency injection (and SOLID) everywhere
 Make everything modular.
 You will need a lot of tests. There are tons of
edge cases in Excel behavior.
 Measure performance on real models.
 You need to have some sort of automatic model
tester.
 Create convenient debug tools (you will spent a
lot of time debugging)
 Understand how V8 works
Telegram: @JABASCRIPT
Viktor Turskyi
viktor@webbylab.com
@koorchik @koorchik
https://webbylab.com

More Related Content

How to create a high performance excel engine in java script