際際滷

際際滷Share a Scribd company logo
Real-time search engine 
Index/search concept 
http://fastcatsearch.org 
Sang Song 
swsong@websqrd.com
Concept 
Push documents and search right away. 
 No full indexing 
 Add document any time even when searching 
 No indexing node, but master node 
 Master node index document first and toss docs to other nodes 
 Every node index their documents independently 
 Master node checks other nodes indexing integrity in cluster
Indexing _Overview 
Document 
Document 
Memory Hash Posting 
re-do log file 
Write re-do log while put documents in memory posting. 
When memory posting flushes posting safely, then remove re-do log file. 
Sync Flush sync posting Memory posting sync their data to sync posting at idle time continuously. 
When memory posting exceeds limited size, flush them to the minor flush posting 
Temp posting Temp Minor Flush posting Temp posting Temp posting 
Temp posting 
5 minor flush postings append to major flush posting 
Major Flush Temp posting Temp posting Temp posting 
 
5 major flush postings go to 1 segment 
Segment #N Segment #3 
 
 Segment #2 Segment #1 
* Each minor flush posting file size is about 200MB, and major flush posting file size is about 1GB. Each segment size is about 1GB.
Searching _Overview 
Memory Hash Posting 
Temp posting Temp posting Temp posting 
Temp posting Temp posting 
 
Segment #N 
 
 Segment #2 Segment #1 
Search 
Search memory posting 
Search minor temp posting 
Search major temp posting 
Search segment posting 
Real-time 
Aggregate 
Memory Minor Major Segment 
Ranking ReRseuslutlt 
ReRseuslut lt 
Document 
Search Result 
* When search, sync posting is not used. Memory posting has the same data as sync postings.
Searching _Segment 
Dynamic segment 
Memory Hash Posting 
Temp posting Temp posting Temp posting 
Temp posting Temp posting 
 
 
 Segment #2 Segment #1 
Temporary segment 
Stable segment 
Segment #N 
Segment #N - 1 
First Search Next Search 
Stable segment #1 
Stable segment #2 
Stable segment #3 
Dynamic segment 
time 
Searcher 
Temporary segment 
Stable segment #1 
Stable segment #2 
Stable segment #3 
Dynamic segment 
Searcher 
Temporary segment 
Search updated index 
only. 
Search all index

More Related Content

Realtime search engine concept

  • 1. Real-time search engine Index/search concept http://fastcatsearch.org Sang Song swsong@websqrd.com
  • 2. Concept Push documents and search right away. No full indexing Add document any time even when searching No indexing node, but master node Master node index document first and toss docs to other nodes Every node index their documents independently Master node checks other nodes indexing integrity in cluster
  • 3. Indexing _Overview Document Document Memory Hash Posting re-do log file Write re-do log while put documents in memory posting. When memory posting flushes posting safely, then remove re-do log file. Sync Flush sync posting Memory posting sync their data to sync posting at idle time continuously. When memory posting exceeds limited size, flush them to the minor flush posting Temp posting Temp Minor Flush posting Temp posting Temp posting Temp posting 5 minor flush postings append to major flush posting Major Flush Temp posting Temp posting Temp posting 5 major flush postings go to 1 segment Segment #N Segment #3 Segment #2 Segment #1 * Each minor flush posting file size is about 200MB, and major flush posting file size is about 1GB. Each segment size is about 1GB.
  • 4. Searching _Overview Memory Hash Posting Temp posting Temp posting Temp posting Temp posting Temp posting Segment #N Segment #2 Segment #1 Search Search memory posting Search minor temp posting Search major temp posting Search segment posting Real-time Aggregate Memory Minor Major Segment Ranking ReRseuslutlt ReRseuslut lt Document Search Result * When search, sync posting is not used. Memory posting has the same data as sync postings.
  • 5. Searching _Segment Dynamic segment Memory Hash Posting Temp posting Temp posting Temp posting Temp posting Temp posting Segment #2 Segment #1 Temporary segment Stable segment Segment #N Segment #N - 1 First Search Next Search Stable segment #1 Stable segment #2 Stable segment #3 Dynamic segment time Searcher Temporary segment Stable segment #1 Stable segment #2 Stable segment #3 Dynamic segment Searcher Temporary segment Search updated index only. Search all index