際際滷

際際滷Share a Scribd company logo
Performance Tuning In Ranker.
            com
Agenda


 Introduction to Ranker
 Performance Challenges In Ranker
 Performance Tuning Strategies
 Conclusion
Introduction To Ranker


Ranker is a social site and platform that is in essence an operating system for Lists.
Ranker makes it easy, fun, and social for users to Rank things  anything - via a
Netflix-style drop-and-drag interface and a huge backend database. Everything in the
system is an object, so that we can aggregate individual lists and answer the wisdom
of crowds question what is the best ___.

Ranker is fully distributable. So for example a travel blog can embed Ranker on their
site that allows their users to easily rank their own favorite golf destinations. This gives
the blog a sticky interactive tool, as well as valuable content showcasing a continually
updated ranking of their communitys consensus picks for golf spots.

The Ranker platform is flexible enough to be used for publishing, social networking,
shopping, polling, even organization.
Performance Challenges In Ranker



The Ranker application has to deal with two main performance issues:

1. Data Volume - One of the biggest challenges while building Ranker, compared to
other regular web applications is the volume of data that has to be managed. Ranker
deals with close to 10 million topics, most of which have been obtained from Freebase.
Freebase uses a custom RDF store to persist and retrieve its data. However Ranker
needs to achieve the same performance levels using a relational database.

2. Traffic - Ranker deals with a lot of social and entertaining content. This results in
traffic spikes where its not uncommon to get a huge number of visitors in very short
span of time. We have often seen around 40,000 visits to a single page within a span
of one hour, before subsiding back to a more reasonable traffic volume.
Performance Tuning Strategies


The performance challenges explained in the previous slide are
handled using the following methods:

 Caching
 Database De-normalization
 Hardware
 Delayed Calculation/Aggregation
 Event Based Post-Processing
 Search Indexing
Caching



Ranker implements caching using the open source Ehcache framework. Caching is
implemented to various depths within the application. While some parts of the
application only use caching to store backend objects that are time-consuming to load,
other parts of the application cache the entire request by storing the generated HTML in
the cache.

Caching in Ranker is also well integrated with the custom CMS that is used to configure
various pages in the application. The CMS allows us to specify different cache
expiration times for each block of each configured page in Ranker.
Database De-normalization



Since the Ranker traffic patterns indicate that a huge percentage of activity on the site
is for "reads" and a much smaller percentage for "writes", de-normalizing the database
provides huge performance benefits for the application.

Database de-normalization often involves duplicating data across tables in order to
avoid expensive joins in the SQL queries. Hence it involves a lot of overhead while
editing or deleting the de-normalized entities. A single user action might require the
application to update multiple locations due to this technique. This is also very prone to
causing bugs in the system when the programmers are not aware of all the places the
data is duplicated in. Hence de-normalization has been used very cautiously and is
used when none of the other approaches are applicable.
Hardware


Using better hardware is often a much simpler and cheaper option than investing a lot
of time in improving the performance of some parts of the application. We have made
sure that we have the most suitable hardware for the systems that are being built,
based on the amount of memory and processing power needed.

Ranker also uses hardware load balancers to distribute the load across multiple web
servers. This makes a huge difference when there is a spike in traffic, as mentioned in
the Performance Challenges In Ranker section.

Here is one situation where coding for better hardware made a huge difference to the
performance: One of the background processes in Ranker required to make around 9
million queries to be able to complete its job. Later we realized that by loading all the
data into memory in one shot, we could reduce the number of queries to a few
thousand. However this would require us store around 3GB of data in memory. Hence it
made more sense to get systems with bigger memory capacity. This change resulted in
the performance increasing by about 20 times.
Delayed Calculation/Aggregation

Ranker uses a large number of small batch programs that perform
calculations using complex algorithms, on a regular interval. This allows
us to pre calculate scores for lists and items and hence avoid performing
the calculations every time data is retrieved or stored. The tricky part in
using this technique is to choose the right amount of pre-calculation of
data. Too much pre-calculation will result in large number of results to
store, however too less of it can result in doing a lot of calculation while
loading the data.

For example, this technique is used to calculate the most interesting lists
in each domain in Ranker. The algorithm to identify the interesting lists
uses the a lot of factors like number of views, number of votes, etc and is
executed once a day. In this case, instead of determining the most
interesting lists in each domain, we only determine a universal score for
each list. This score can be used to get the most interesting lists in any
domain.
Event Based Post-Processing


This is another form of Delayed Calculation. Some of the user actions in
Ranker will need the system to sometimes perform complex/time-
consuming operations. For cases like these, Ranker uses an event based
post-processing framework to perform these operations asynchronously.
This will allow us to give the user a quick response time and also perform
time-consuming operations within a few seconds after the action. The only
disadvantage in using this approach is the difficulty it causes in reporting
errors and failures to the user.

For example, when someone comments on a list, we need to notify the list
author through an email. Even though the list author needs to be notified
immediately, having a delay of a few seconds is acceptable. Performing
this asynchronously will allow us to give a quick response to the user
without having the user wait for the email to be sent.
Search Indexing



Ranker uses popular indexing tools like Lucene and Solr to index all
searchable data. Using a search index provides huge performance benefits
while performing text based search in the application.

Different strategies are used to add data into the index. Entities which are
frequently created / changed in the system are added to the index through an
automated SQL query, which runs every 5 minutes. Other entities, like the
data obtained from freebase is updated through a program that is triggered
manually.

Solr allows us to search across a number of fields and also do so using
different weights for each type of field, without compromising on the speed of
the search.
Conclusion



Making any changes in the application for improving the speed and performance
of the application always involve certain trade-offs. In Ranker, we have made
sure that we only make changes once they are analyzed well and we are ready
to handle all the side effects of the change. Changes often involve additional
effort in maintenance and environment setup. Some of them even require us to
acquire and maintain new servers, like in the case of search indexing and
background processes.

By choosing a variety of techniques to handle the different performance
problems in the application, Ranker has been able to deliver and scale to the
traffic as it becomes more popular.
Ad

Recommended

Key to optimal end user experience
Key to optimal end user experience
ManageEngine, Zoho Corporation
Cloud applications monitoring in digital transformation era
Cloud applications monitoring in digital transformation era
ManageEngine, Zoho Corporation
How to Spot a Great API
How to Spot a Great API
Scribe Software Corp.
Ibm info sphere datastage tutorial part 1 architecture examples
Ibm info sphere datastage tutorial part 1 architecture examples
Sandeep Sharma IIMK Smart City,IoT,Bigdata,Cloud,BI,DW
Improving Reporting Performance
Improving Reporting Performance
Dhiren Gala
Introduction to microsoft sql server 2008 r2
Introduction to microsoft sql server 2008 r2
Eduardo Castro
Hybrid provider based on dso using real time data acquisition in sap bw 7.30
Hybrid provider based on dso using real time data acquisition in sap bw 7.30
Sabyasachi Das
Building No-Code Collaboration Solutions on Office 365
Building No-Code Collaboration Solutions on Office 365
Dragan Panjkov
Keep Your Code Low, Low, Low, Low, Low: Getting to Digitally Driven With Orac...
Keep Your Code Low, Low, Low, Low, Low: Getting to Digitally Driven With Orac...
Jim Czuprynski
Closing the door on application performance problems
Closing the door on application performance problems
ManageEngine, Zoho Corporation
Analysing data analytics use cases to understand big data platform
Analysing data analytics use cases to understand big data platform
dataeaze systems
My Experience of Salesforce Project
My Experience of Salesforce Project
Tejaswini Lambe
SAP BODS 4.2
SAP BODS 4.2
TL Technologies - Thoughts Become Things
Apache kafka- Onkar Kadam
Apache kafka- Onkar Kadam
Onkar Kadam
Data Con LA 2018 - Standing on shoulders of giants by Sooraj Akkammadam
Data Con LA 2018 - Standing on shoulders of giants by Sooraj Akkammadam
Data Con LA
SharePoint Intelligence Real World Business Workflow With Share Point Designe...
SharePoint Intelligence Real World Business Workflow With Share Point Designe...
Ivan Sanders
Enterprise TUG Webinar 9.2 Upgrade 2-15-16
Enterprise TUG Webinar 9.2 Upgrade 2-15-16
Mark Wu
What Sharon does in_Project Server 2013
What Sharon does in_Project Server 2013
Sharon Rene Summers
SQLlite and Full Text Search Presentation
SQLlite and Full Text Search Presentation
leximo
Large Data Management Strategies
Large Data Management Strategies
Salesforce Developers
SAP HANA - Understanding the Basics
SAP HANA - Understanding the Basics
Global Business Solutions SME
Querona Presentation 2018
Querona Presentation 2018
Synergo!
Performance tuning and optimization (ppt)
Performance tuning and optimization (ppt)
Harish Chand
Optimizing JMS Performance for Cloud-based Application Servers
Optimizing JMS Performance for Cloud-based Application Servers
Zhenyun Zhuang
Eossys
Eossys
EosSoftware
Ranker jms implementation
Ranker jms implementation
EosSoftware
Achieving quality with tools case study
Achieving quality with tools case study
EosSoftware
The Six Highest Performing B2B Blog Post Formats
The Six Highest Performing B2B Blog Post Formats
Barry Feldman
The Outcome Economy
The Outcome Economy
Helge Tenn淡
Starting a search application
Starting a search application
Lucidworks (Archived)

More Related Content

What's hot (15)

Keep Your Code Low, Low, Low, Low, Low: Getting to Digitally Driven With Orac...
Keep Your Code Low, Low, Low, Low, Low: Getting to Digitally Driven With Orac...
Jim Czuprynski
Closing the door on application performance problems
Closing the door on application performance problems
ManageEngine, Zoho Corporation
Analysing data analytics use cases to understand big data platform
Analysing data analytics use cases to understand big data platform
dataeaze systems
My Experience of Salesforce Project
My Experience of Salesforce Project
Tejaswini Lambe
SAP BODS 4.2
SAP BODS 4.2
TL Technologies - Thoughts Become Things
Apache kafka- Onkar Kadam
Apache kafka- Onkar Kadam
Onkar Kadam
Data Con LA 2018 - Standing on shoulders of giants by Sooraj Akkammadam
Data Con LA 2018 - Standing on shoulders of giants by Sooraj Akkammadam
Data Con LA
SharePoint Intelligence Real World Business Workflow With Share Point Designe...
SharePoint Intelligence Real World Business Workflow With Share Point Designe...
Ivan Sanders
Enterprise TUG Webinar 9.2 Upgrade 2-15-16
Enterprise TUG Webinar 9.2 Upgrade 2-15-16
Mark Wu
What Sharon does in_Project Server 2013
What Sharon does in_Project Server 2013
Sharon Rene Summers
SQLlite and Full Text Search Presentation
SQLlite and Full Text Search Presentation
leximo
Large Data Management Strategies
Large Data Management Strategies
Salesforce Developers
SAP HANA - Understanding the Basics
SAP HANA - Understanding the Basics
Global Business Solutions SME
Querona Presentation 2018
Querona Presentation 2018
Synergo!
Performance tuning and optimization (ppt)
Performance tuning and optimization (ppt)
Harish Chand
Keep Your Code Low, Low, Low, Low, Low: Getting to Digitally Driven With Orac...
Keep Your Code Low, Low, Low, Low, Low: Getting to Digitally Driven With Orac...
Jim Czuprynski
Closing the door on application performance problems
Closing the door on application performance problems
ManageEngine, Zoho Corporation
Analysing data analytics use cases to understand big data platform
Analysing data analytics use cases to understand big data platform
dataeaze systems
My Experience of Salesforce Project
My Experience of Salesforce Project
Tejaswini Lambe
Apache kafka- Onkar Kadam
Apache kafka- Onkar Kadam
Onkar Kadam
Data Con LA 2018 - Standing on shoulders of giants by Sooraj Akkammadam
Data Con LA 2018 - Standing on shoulders of giants by Sooraj Akkammadam
Data Con LA
SharePoint Intelligence Real World Business Workflow With Share Point Designe...
SharePoint Intelligence Real World Business Workflow With Share Point Designe...
Ivan Sanders
Enterprise TUG Webinar 9.2 Upgrade 2-15-16
Enterprise TUG Webinar 9.2 Upgrade 2-15-16
Mark Wu
What Sharon does in_Project Server 2013
What Sharon does in_Project Server 2013
Sharon Rene Summers
SQLlite and Full Text Search Presentation
SQLlite and Full Text Search Presentation
leximo
Querona Presentation 2018
Querona Presentation 2018
Synergo!
Performance tuning and optimization (ppt)
Performance tuning and optimization (ppt)
Harish Chand

Viewers also liked (6)

Optimizing JMS Performance for Cloud-based Application Servers
Optimizing JMS Performance for Cloud-based Application Servers
Zhenyun Zhuang
Eossys
Eossys
EosSoftware
Ranker jms implementation
Ranker jms implementation
EosSoftware
Achieving quality with tools case study
Achieving quality with tools case study
EosSoftware
The Six Highest Performing B2B Blog Post Formats
The Six Highest Performing B2B Blog Post Formats
Barry Feldman
The Outcome Economy
The Outcome Economy
Helge Tenn淡
Optimizing JMS Performance for Cloud-based Application Servers
Optimizing JMS Performance for Cloud-based Application Servers
Zhenyun Zhuang
Ranker jms implementation
Ranker jms implementation
EosSoftware
Achieving quality with tools case study
Achieving quality with tools case study
EosSoftware
The Six Highest Performing B2B Blog Post Formats
The Six Highest Performing B2B Blog Post Formats
Barry Feldman
The Outcome Economy
The Outcome Economy
Helge Tenn淡
Ad

Similar to Performance tuning in ranker (20)

Starting a search application
Starting a search application
Lucidworks (Archived)
Efficient Crawling Through Dynamic Priority of Web Page in Sitemap
Efficient Crawling Through Dynamic Priority of Web Page in Sitemap
ieij1
Business Intelligence Solution Using Search Engine
Business Intelligence Solution Using Search Engine
ankur881120
10 tips-for-optimizing-sql-server-performance-white-paper-22127
10 tips-for-optimizing-sql-server-performance-white-paper-22127
Kaizenlogcom
System Design
System Design
SyeedAbrarZaoad1
system-design-interview-an-insiders-guide-2nbsped-9798664653403.pdf
system-design-interview-an-insiders-guide-2nbsped-9798664653403.pdf
ParthNavale
Data mining in web search engine optimization
Data mining in web search engine optimization
BookStoreLib
Web Search Engine, Web Crawler, and Semantics Web
Web Search Engine, Web Crawler, and Semantics Web
Aatif19921
Nondeterministic Software for the Rest of Us
Nondeterministic Software for the Rest of Us
Tomer Gabel
Enterprise performance engineering solutions
Enterprise performance engineering solutions
Infosys
Performance Tuning
Performance Tuning
Jannet Peetz
Enterprise Search: Addressing the First Problem of Big Data & Analytics - Sta...
Enterprise Search: Addressing the First Problem of Big Data & Analytics - Sta...
StampedeCon
Share point 2010 performance and capacity planning best practices
Share point 2010 performance and capacity planning best practices
Eric Shupps
SMART CRAWLER: A TWO-STAGE CRAWLER FOR EFFICIENTLY HARVESTING DEEP-WEB INTERF...
SMART CRAWLER: A TWO-STAGE CRAWLER FOR EFFICIENTLY HARVESTING DEEP-WEB INTERF...
CloudTechnologies
Business owner findability interview questions
Business owner findability interview questions
Ravi Mynampaty
Share point 2013 enterprise search (public)
Share point 2013 enterprise search (public)
Petter Skodvin-Hvammen
Brochure quiterian DDWeb
Brochure quiterian DDWeb
Josep Arroyo
Informix health check - Brochure
Informix health check - Brochure
Vyasaka Technologies Pte Ltd
Phpconf2008 Sphinx En
Phpconf2008 Sphinx En
Murugan Krishnamoorthy
IRJET - Building Your Own Search Engine
IRJET - Building Your Own Search Engine
IRJET Journal
Efficient Crawling Through Dynamic Priority of Web Page in Sitemap
Efficient Crawling Through Dynamic Priority of Web Page in Sitemap
ieij1
Business Intelligence Solution Using Search Engine
Business Intelligence Solution Using Search Engine
ankur881120
10 tips-for-optimizing-sql-server-performance-white-paper-22127
10 tips-for-optimizing-sql-server-performance-white-paper-22127
Kaizenlogcom
system-design-interview-an-insiders-guide-2nbsped-9798664653403.pdf
system-design-interview-an-insiders-guide-2nbsped-9798664653403.pdf
ParthNavale
Data mining in web search engine optimization
Data mining in web search engine optimization
BookStoreLib
Web Search Engine, Web Crawler, and Semantics Web
Web Search Engine, Web Crawler, and Semantics Web
Aatif19921
Nondeterministic Software for the Rest of Us
Nondeterministic Software for the Rest of Us
Tomer Gabel
Enterprise performance engineering solutions
Enterprise performance engineering solutions
Infosys
Performance Tuning
Performance Tuning
Jannet Peetz
Enterprise Search: Addressing the First Problem of Big Data & Analytics - Sta...
Enterprise Search: Addressing the First Problem of Big Data & Analytics - Sta...
StampedeCon
Share point 2010 performance and capacity planning best practices
Share point 2010 performance and capacity planning best practices
Eric Shupps
SMART CRAWLER: A TWO-STAGE CRAWLER FOR EFFICIENTLY HARVESTING DEEP-WEB INTERF...
SMART CRAWLER: A TWO-STAGE CRAWLER FOR EFFICIENTLY HARVESTING DEEP-WEB INTERF...
CloudTechnologies
Business owner findability interview questions
Business owner findability interview questions
Ravi Mynampaty
Share point 2013 enterprise search (public)
Share point 2013 enterprise search (public)
Petter Skodvin-Hvammen
Brochure quiterian DDWeb
Brochure quiterian DDWeb
Josep Arroyo
IRJET - Building Your Own Search Engine
IRJET - Building Your Own Search Engine
IRJET Journal
Ad

Performance tuning in ranker

  • 1. Performance Tuning In Ranker. com
  • 2. Agenda Introduction to Ranker Performance Challenges In Ranker Performance Tuning Strategies Conclusion
  • 3. Introduction To Ranker Ranker is a social site and platform that is in essence an operating system for Lists. Ranker makes it easy, fun, and social for users to Rank things anything - via a Netflix-style drop-and-drag interface and a huge backend database. Everything in the system is an object, so that we can aggregate individual lists and answer the wisdom of crowds question what is the best ___. Ranker is fully distributable. So for example a travel blog can embed Ranker on their site that allows their users to easily rank their own favorite golf destinations. This gives the blog a sticky interactive tool, as well as valuable content showcasing a continually updated ranking of their communitys consensus picks for golf spots. The Ranker platform is flexible enough to be used for publishing, social networking, shopping, polling, even organization.
  • 4. Performance Challenges In Ranker The Ranker application has to deal with two main performance issues: 1. Data Volume - One of the biggest challenges while building Ranker, compared to other regular web applications is the volume of data that has to be managed. Ranker deals with close to 10 million topics, most of which have been obtained from Freebase. Freebase uses a custom RDF store to persist and retrieve its data. However Ranker needs to achieve the same performance levels using a relational database. 2. Traffic - Ranker deals with a lot of social and entertaining content. This results in traffic spikes where its not uncommon to get a huge number of visitors in very short span of time. We have often seen around 40,000 visits to a single page within a span of one hour, before subsiding back to a more reasonable traffic volume.
  • 5. Performance Tuning Strategies The performance challenges explained in the previous slide are handled using the following methods: Caching Database De-normalization Hardware Delayed Calculation/Aggregation Event Based Post-Processing Search Indexing
  • 6. Caching Ranker implements caching using the open source Ehcache framework. Caching is implemented to various depths within the application. While some parts of the application only use caching to store backend objects that are time-consuming to load, other parts of the application cache the entire request by storing the generated HTML in the cache. Caching in Ranker is also well integrated with the custom CMS that is used to configure various pages in the application. The CMS allows us to specify different cache expiration times for each block of each configured page in Ranker.
  • 7. Database De-normalization Since the Ranker traffic patterns indicate that a huge percentage of activity on the site is for "reads" and a much smaller percentage for "writes", de-normalizing the database provides huge performance benefits for the application. Database de-normalization often involves duplicating data across tables in order to avoid expensive joins in the SQL queries. Hence it involves a lot of overhead while editing or deleting the de-normalized entities. A single user action might require the application to update multiple locations due to this technique. This is also very prone to causing bugs in the system when the programmers are not aware of all the places the data is duplicated in. Hence de-normalization has been used very cautiously and is used when none of the other approaches are applicable.
  • 8. Hardware Using better hardware is often a much simpler and cheaper option than investing a lot of time in improving the performance of some parts of the application. We have made sure that we have the most suitable hardware for the systems that are being built, based on the amount of memory and processing power needed. Ranker also uses hardware load balancers to distribute the load across multiple web servers. This makes a huge difference when there is a spike in traffic, as mentioned in the Performance Challenges In Ranker section. Here is one situation where coding for better hardware made a huge difference to the performance: One of the background processes in Ranker required to make around 9 million queries to be able to complete its job. Later we realized that by loading all the data into memory in one shot, we could reduce the number of queries to a few thousand. However this would require us store around 3GB of data in memory. Hence it made more sense to get systems with bigger memory capacity. This change resulted in the performance increasing by about 20 times.
  • 9. Delayed Calculation/Aggregation Ranker uses a large number of small batch programs that perform calculations using complex algorithms, on a regular interval. This allows us to pre calculate scores for lists and items and hence avoid performing the calculations every time data is retrieved or stored. The tricky part in using this technique is to choose the right amount of pre-calculation of data. Too much pre-calculation will result in large number of results to store, however too less of it can result in doing a lot of calculation while loading the data. For example, this technique is used to calculate the most interesting lists in each domain in Ranker. The algorithm to identify the interesting lists uses the a lot of factors like number of views, number of votes, etc and is executed once a day. In this case, instead of determining the most interesting lists in each domain, we only determine a universal score for each list. This score can be used to get the most interesting lists in any domain.
  • 10. Event Based Post-Processing This is another form of Delayed Calculation. Some of the user actions in Ranker will need the system to sometimes perform complex/time- consuming operations. For cases like these, Ranker uses an event based post-processing framework to perform these operations asynchronously. This will allow us to give the user a quick response time and also perform time-consuming operations within a few seconds after the action. The only disadvantage in using this approach is the difficulty it causes in reporting errors and failures to the user. For example, when someone comments on a list, we need to notify the list author through an email. Even though the list author needs to be notified immediately, having a delay of a few seconds is acceptable. Performing this asynchronously will allow us to give a quick response to the user without having the user wait for the email to be sent.
  • 11. Search Indexing Ranker uses popular indexing tools like Lucene and Solr to index all searchable data. Using a search index provides huge performance benefits while performing text based search in the application. Different strategies are used to add data into the index. Entities which are frequently created / changed in the system are added to the index through an automated SQL query, which runs every 5 minutes. Other entities, like the data obtained from freebase is updated through a program that is triggered manually. Solr allows us to search across a number of fields and also do so using different weights for each type of field, without compromising on the speed of the search.
  • 12. Conclusion Making any changes in the application for improving the speed and performance of the application always involve certain trade-offs. In Ranker, we have made sure that we only make changes once they are analyzed well and we are ready to handle all the side effects of the change. Changes often involve additional effort in maintenance and environment setup. Some of them even require us to acquire and maintain new servers, like in the case of search indexing and background processes. By choosing a variety of techniques to handle the different performance problems in the application, Ranker has been able to deliver and scale to the traffic as it becomes more popular.