Fastcatsearch's next major version search/index structure concept.
## Push documents and search right away.
* No full indexing
* Add document any time even when searching
* No indexing node, but master node
* Master node index document first and toss docs to other nodes
* Every node index their documents independently
* Master node checks other nodes indexing integrity in a cluster
1 of 5
More Related Content
Realtime search engine concept
1. Real-time search engine
Index/search concept
http://fastcatsearch.org
Sang Song
swsong@websqrd.com
2. Concept
Push documents and search right away.
No full indexing
Add document any time even when searching
No indexing node, but master node
Master node index document first and toss docs to other nodes
Every node index their documents independently
Master node checks other nodes indexing integrity in cluster
3. Indexing _Overview
Document
Document
Memory Hash Posting
re-do log file
Write re-do log while put documents in memory posting.
When memory posting flushes posting safely, then remove re-do log file.
Sync Flush sync posting Memory posting sync their data to sync posting at idle time continuously.
When memory posting exceeds limited size, flush them to the minor flush posting
Temp posting Temp Minor Flush posting Temp posting Temp posting
Temp posting
5 minor flush postings append to major flush posting
Major Flush Temp posting Temp posting Temp posting
5 major flush postings go to 1 segment
Segment #N Segment #3
Segment #2 Segment #1
* Each minor flush posting file size is about 200MB, and major flush posting file size is about 1GB. Each segment size is about 1GB.
4. Searching _Overview
Memory Hash Posting
Temp posting Temp posting Temp posting
Temp posting Temp posting
Segment #N
Segment #2 Segment #1
Search
Search memory posting
Search minor temp posting
Search major temp posting
Search segment posting
Real-time
Aggregate
Memory Minor Major Segment
Ranking ReRseuslutlt
ReRseuslut lt
Document
Search Result
* When search, sync posting is not used. Memory posting has the same data as sync postings.