We needed a bridge between the real-time tier, where we used Couchbase, and the batch tier, built on Hadoop. For lack of something better, we built our own: Couchdoop – an open-source Hadoop connector for Couchbase. Our presentation will discuss best practices on how to create a Hadoop connector for a NoSQL database. We will talk about the challenges we encountered while developing Couchdoop and share how we tuned it for performance. Together with Bigstep we worked on performance benchmarks for our technology, which show how much throughput that can be squeezed from a Hadoop connector.
1 of 63
Download to read offline
More Related Content
Călin Andrei Burloiu - Connecting Hadoop with Couchbase: Engineering for performance
11. Two-tier Architecture
Real-time Tier (Couchbase)
• Detects user intent
• Gives next best recommendation or deal
Data Bridge (Couchdoop)
Batch Tier (Hadoop)
• Recommends products
User events
Recommendations
17. {
“user”: “Rudy”,
“recommendations”: [
[“Ibanez Acoustic Guitar”,
450],
[“Guitar Tuner”, 120],
[“Sound Mixer”, 30]
]
}
E
X
P
O
R
T
Exporting Data
Couchdoop
Machine
Learning
Recommenda0ons
Hadoop
18. {
“user”: “Rudy”,
“recommendations”: [
[“Ibanez Acoustic Guitar”,
450],
[“Guitar Tuner”, 120],
[“Sound Mixer”, 30]
]
}
U
p
d
a
t
e
Updating Data
Couchdoop
Machine
Learning
Recommenda0ons
Hadoop