Apache Bigtop packages the Hadoop ecosystem into RPM and DEB packages. It provides a foundation for commercial Hadoop distributions and services. Bigtop features include a build toolchain, package framework, Puppet deployment scripts, and integration test framework. The next release of Bigtop 1.4 is upcoming in early April 2019, adding AArch64 support, improved testing, and package version updates. Future work includes focusing on core big data components like Spark and Flink, adding Kubernetes and cloud support, and expanding integrations.
1 of 24
Download to read offline
More Related Content
Apache Bigtop and ARM64 / AArch64 - Empowering Big Data Everywhere
1. AArch64 and Apache Bigtop
- Empowering Big Data Everywhere
Evans Ye
Jun He
2. Evans Ye - Intro
Member of the Apache Software Foundation
- Spread the Apache Way
- Mentorship
Apache Bigtop PMC member, Committer, former VP
- About to introduce
Director of Taiwan Data Engineering Association (TDEA)
- Promote OSS, big data related technology
- Hold conference, workshop, meetup
3. What is Apache Bigtop?
Package Hadoop ecosystem to RPM/DEB artifacts
Purely open source Hadoop Distribution
9. Bigtop Toolchain
A set of Puppet recipes to install required libraries, build tools
To prepare a bigtop build environment:
- Java
10. Containerized build infra
Immutable build environment
Super friendly for porting
- Prepare aarch64 images
- Try build on docker
- Fix compatibility issues
11. Bigtop Package
Framework to build Hadoop ecosystem components into RPM/DEB packages
Two ways:
- Release tarball -> build -> (patch) -> package
- Git branch/commit -> build -> (patch) -> package
How to:
- $ ./gradlew hadoop-pkg-ind
Why patch?
- Lots of compatibility issue
- Say Spark works well with hive and oozie, but got no luck with Zeppelin
- We focus on the entire distribution
12. Bigtop Puppet & Test
Bigtop Puppet:
- A set of Puppet recipes to deploy Hadoop ecosystem components
Bigtop Test
- Bigtop Test Framework
- Test utilities for writing tests in Java/Groovy
- Bigtop Smoke Test
- Bunch of built-in smoke tests (quick diagnosis)
- Bigtop Integration Test
- Bunch of built-in integration tests (coverage)
- Bigtop Package Test
- Designed to find bugs in the packages before deployed
14. Bigtop Sandbox
Bigtop stack built as image to be easily consumed
- How to:
- Quick start environment
- Handy image for applications to do integration test
15. Bigtop Integration Test Framework 2.0
Full support to build and test inside docker with one-stop seamlessly integrated
entry at ./gradlew
- Package
- $ ./gradlew spark-pkg-ind repo-ind
- Deploy & Test
- $ ./gradlew docker-provisioner
- Build -> Deploy -> Test lifecycle in one stop
- $ ./gradlew spark-pkg-ind repo-ind docker-provisioner
16. Bigtop Integration Test Framework 2.0
- Build directly from branch or commit hash:
- $ ./gradlew allclean kafka-pkg-ind
- Advantages:
- For developer to quickly evaluate the result
- Code that brokes Integration can be discovered earlier in dev
17. Apache Bigtop: v1.4
Timeline: Upcoming Early April, 2019!
- Integration Test Framework 2.0
- one-stop seamlessly integrated entry at ./gradlew to build and test inside docker
- Smoke Test CI Matrix go online
- https://ci.bigtop.apache.org/view/Test/job/Bigtop-trunk-smoke-tests
- Version bumps
- Hadoop 2.8.5, Spark 2.2.3, Kafka 2.1.1, Flume 1.9.0, Alluxio 1.8.1
- More built-in test coverage
- Hive, Flink, Giraph
- A Lot of improvements and bug fixes!
- 100 JIRAs resolved
18. Jun He - Intro
Apache Bigtop PMC member, Committer
- Now you get it ...
Lead of Enterprise Workloads Team in Arm OSS Group
- Enable and optimize Data Science/Storage stacks on Arm64
- Contribute to build a diverse software ecosystem
19. Apache Bigtop on AArch64
Added to
build env
2016/4 2017/3
First try on
v1.2.1 released
with a lot of
AArch64 related
patches merged
v1.3.0 released with
AArch64 officially
added to support
AArch64 CI
20. What we learned so far
- Dependency issues
- Native binaries: protobuf, phantomjs,
- Jars with native binaries embedded: levedb-jni, ignite-shmem, jffi,
- Version mismatch: slf4j, log4j, log4j2,
- Cyclic references take a lot of effort to fix
- Tests are important
21. There will be more and more big data tools and integrations on the cloud
- Lots of money goes into cloud vendors pocket
K8S is taking up the whole industry, including big data
- HDFS on K8S, Spark on K8S, Flink on K8S, etc
- One single platform for OLTP, OLAP, ML/AI
More focus on user experience (can do -> perform well -> easy to use)
- NewSQL
- More user friendly APIs
Where is Big Data heading ?
22. Apache Bigtop: Future Roadmap
Focus on components that maximize the core value of big data
- Processing: Spark, Flink, Hive
- Storage: Hadoop, Kafka
- NoSQL: HBase, Cassandra
Cloud / K8S native support (operators) for build, deploy, and test
Embrace cloud(AWS/GCP/Azure) and introduce more integrations
24. Questions ?
Dev Mailing lists
Wiki page
CI page
Jira link
Linaro Collaborate page
Contact details :
Evans Ye: evansye@apache.org
Jun He: jun.he@arm.com