際際滷

際際滷Share a Scribd company logo
Easteros,	
 new	
 forecas.ng	
 
analy.c	
 pla3orm	
 
Jack Wang
Amazon forecasting
History	
 
≒ No	
 friendly	
 gateways	
 to	
 access	
 historical	
 
forecas3ng	
 snapshot	
 (input,	
 interim,	
 output,	
 
etc.)	
 
≒ No	
 friendly	
 gateways	
 to	
 submit	
 ad-足hoc	
 queries	
 
(troubleshoo3ng)	
 and	
 new	
 algorithms	
 
≒ SLA	
 ETLs	
 are	
 hard	
 to	
 launch	
 and	
 maintain	
 
≒ 
2
Architecture	
 
3	
 
Cloud	
 Based	
 Data	
 
Warehouse
Hadoop	
 (EMR)	
 Clusters
EASTEROS: Router	
 service
EASTEROS: Analy.c	
 Portal	
 /	
 CLI
Why	
 Easteros?	
 
≒ Simple	
 gateways	
 for	
 job	
 submission	
 and	
 
monitoring	
 
Access	
 to	
 each	
 snapshot	
 of	
 pipeline	
 run	
 	
 
≒ Separate	
 the	
 big	
 data	
 soGware	
 stack	
 from	
 
users	
 (analysts,	
 scien3sts,	
 retail	
 in-足stock	
 
managers)	
 
4
Easteros鐚Router	
 service	
 	
 
≒ Users	
 perspec3ve	
 
REST-足ful	
 service	
 to	
 run	
 Hive	
 and	
 Hadoop	
 jobs.	
 
Auto	
 select	
 the	
 proper	
 EMR	
 Clusters	
 based	
 on	
 
cluster	
 load	
 
Users	
 doesnt	
 need	
 to	
 setup	
 and	
 maintain	
 clusters	
 
Sophis3cated	
 users	
 can	
 provide	
 clusters	
 con鍖gs	
 
Check	
 job	
 logs	
 periodically	
 (鍖ush	
 to	
 S3	
 every	
 5	
 
minutes)	
 
5
Easteros鐚Router	
 service	
 	
 
≒ SDE	
 perspec3ve	
 
Spin	
 up	
 new	
 clusters	
 automa3cally	
 
Override	
 site-足speci鍖c	
 hive/hadoop	
 con鍖gura3ons	
 
6
7	
 
EASTE
ROS	
 
DynamoDB	
 
metadata
Con鍖gure
Spin	
 up
Submit	
 Query/Algo
Synamo	
 
Archiver
Easteros:	
 service	
 call	
 
8	
 
Easteros
REST-足API	
 
	
 
HQL	
 or	
 jars	
 (uploaded	
 to	
 s3)	
 
Command	
 args	
 
Job	
 priority
Easteros:	
 service	
 call	
 
9	
 
Easteros
Job	
 id	
 
Job	
 logs	
 
Result	
 鍖le
10
11	
 
Job	
 received	
 in	
 NA	
 and	
 CN	
 clusters.	
 
More	
 and	
 more	
 ETL	
 are	
 migrated	
 to	
 Easteros.	
 
Ad-足hoc	
 queries	
 are	
 quite	
 stable.
Acknowledgement	
 
Thanks
v金Rauser, John;
v金Touloumtzis, Michael; and
v金Bol, Colleen
12

More Related Content

2014-04-easteros

  • 1. Easteros, new forecas.ng analy.c pla3orm Jack Wang Amazon forecasting
  • 2. History ≒ No friendly gateways to access historical forecas3ng snapshot (input, interim, output, etc.) ≒ No friendly gateways to submit ad-足hoc queries (troubleshoo3ng) and new algorithms ≒ SLA ETLs are hard to launch and maintain ≒ 2
  • 3. Architecture 3 Cloud Based Data Warehouse Hadoop (EMR) Clusters EASTEROS: Router service EASTEROS: Analy.c Portal / CLI
  • 4. Why Easteros? ≒ Simple gateways for job submission and monitoring Access to each snapshot of pipeline run ≒ Separate the big data soGware stack from users (analysts, scien3sts, retail in-足stock managers) 4
  • 5. Easteros鐚Router service ≒ Users perspec3ve REST-足ful service to run Hive and Hadoop jobs. Auto select the proper EMR Clusters based on cluster load Users doesnt need to setup and maintain clusters Sophis3cated users can provide clusters con鍖gs Check job logs periodically (鍖ush to S3 every 5 minutes) 5
  • 6. Easteros鐚Router service ≒ SDE perspec3ve Spin up new clusters automa3cally Override site-足speci鍖c hive/hadoop con鍖gura3ons 6
  • 7. 7 EASTE ROS DynamoDB metadata Con鍖gure Spin up Submit Query/Algo Synamo Archiver
  • 8. Easteros: service call 8 Easteros REST-足API HQL or jars (uploaded to s3) Command args Job priority
  • 9. Easteros: service call 9 Easteros Job id Job logs Result 鍖le
  • 10. 10
  • 11. 11 Job received in NA and CN clusters. More and more ETL are migrated to Easteros. Ad-足hoc queries are quite stable.