Deconstructing Popularity Bias in Recommender Systems_ Origins, Impacts, and Mitigation
1. Deconstructing Popularity Bias in Recommender
Systems: Origins, Impacts, and Mitigation
Amit Jaspal
Trust & Responsibility in Recommendation Systems, WSDM 2025
2. Amits Introduction
Thank you for the opportunity to speak !
Engineering Manager and Research Scientist at Meta leading ecommerce recommendations team
Building recommender and information systems in Industry for the last 14 years
Ecommerce recommendations at Meta
Video recommendations at Meta
Ads recommendations at Meta
Newsfeed recommendations at Linkedin
Apace SOLR at Cloudera
Hurricane search engine in D.E.Shaw
Research fellow at NCSA and TDIL Labs
Trust & Responsibility in Recommendation Systems Workshop WSDM 2025
3. What is Popularity Bias?
Popularity bias refers to the tendency of a recommender system to over-recommend popular items at the
expense of less popular ones. In other words, already-popular items get disproportionate exposure, while long-tail
items are under-represented.
Not a unique problem to recommender systems, but dynamic nature of recommender system makes it worse.
Examples of Popularity Bias in other domains
Academic Research/Citations
Financial Markets/Stock Trading
Book Publishing/Best Seller Lists
Hiring and Job Portals
Trust & Responsibility in Recommendation Systems Workshop WSDM 2025
4. Sources of Popularity Bias in Recommender Systems
Inherent Audience Size Imbalance (Data Bias):
Some items are naturally more appealing to a broader audience.
Item popularity often follows a long-tail distribution inherently.
Even bias-free algorithms will see more interactions with these items.
Model Bias (Algorithmic Bias):
Machine learning models learn patterns from training data, including existing popularity biases.
Collaborative filtering and similar methods tend to amplify popularity signals.
Models may over-generalize from popular item interactions, leading to biased predictions.
Closed Feedback Loop (Systemic Bias):
Dynamic recommendation systems operate in a closed loop.
Recommendations influence user interactions, which become training data for future models.
This creates a feedback loop that can accumulate and exacerbate popularity bias over time.
Trust & Responsibility in Recommendation Systems Workshop WSDM 2025
6. Why Does Popularity Bias Matter?
For Users:
Reduced novelty and serendipity - Recommendations become predictable and less engaging.
Limited personalization - May not discover items truly aligned with individual preferences, especially niche
interests.
Decreased user satisfaction and trust in the system over time.
For Item Providers (Especially Long-Tail):
Reduced visibility and sales opportunities for less popular items.
Unfair competition - Popular items dominate, regardless of quality or relevance to specific users e.g click
baits
Item side cold start problem.
System-Level:
Reinforcement loops - Bias can worsen over time due to feedback cycles.
System behaves suboptimally catering only to popular items on one side and users who are ok with
engagement w/ only popular items.
Trust & Responsibility in Recommendation Systems Workshop WSDM 2025
7. Measuring Popularity Bias
Gini Coefficient
Statistical measure of inequality within a distribution, computed using Lorenz
Curve
Lorenz Curve is a graphical representation of inequality, showing the cumulative
distribution of a resource (e.g., wealth, recommendation exposure) across a
population.
Gini Index can be computed as the area between the Lorenz Curve and the Line
of Equality
Recall breakdown by item set bucket e.g recall@k for head items, recall@k for tail items
Trust & Responsibility in Recommendation Systems Workshop WSDM 2025
8. Mitigation Strategies
Key Mitigation Goals:
Promote long-tail item visibility.
Improve fairness and diversity.
Maintain or improve recommendation accuracy (or minimize accuracy loss).
Categorization by Processing Stage:
Pre-processing: Modify training data before model training.
In-processing (Modeling): Integrate debiasing directly into the model training process.
Post-processing: Adjust recommendation lists after model prediction.
Trust & Responsibility in Recommendation Systems Workshop WSDM 2025
9. Mitigation Strategies - Pre & Post-processing
Pre-processing
Data Sampling: Down-sample popular item interactions or up-sample long-tail item interactions.
Item Exclusion: Remove highly popular items from the training data or candidate pool (use with caution).
Balanced Dataset Creation: Aim for a more uniform distribution of item interactions in training data.
Data Augmentation: Enrich data with side information to provide more context beyond popularity.
Post-processing
Re-scaling (Score Adjustment): Adjust predicted scores based on item popularity.
Re-ranking: Re-order the initial ranked list to promote less popular items.
Rank Aggregation / Slotting: Combine rankings from biased and debiased models.
Post-filtering: Remove top-k popular items from the final recommendation list.
False Positive Correction (FPC): Correct scores probabilistically based on past unclicked
recommendations
Trust & Responsibility in Recommendation Systems Workshop WSDM 2025
10. Mitigation Strategies - In-processing (Model-Level)
Causal Inference Methods:
Counterfactual reasoning - Estimate recommendations without popularity influence.
Model ranking as cause-and-effect relationship to disentangle popularity and user preference
Reducing Memorization
Remove ID features or add large dropouts
Metadata based feature to improve generalization
Re-weighting Approaches:
Adjust item weights during training to balance popular and unpopular items.
Inverse Propensity Scoring (IPS) - Weight items inversely proportional to their popularity.
Regularization-based Approaches:
Add regularization terms to the loss function to penalize popularity bias.
Encourage models to learn from less popular items.
Examples: Popularity-aware regularization, information neutrality regularization.
.
Trust & Responsibility in Recommendation Systems Workshop WSDM 2025
11. Evaluation and Datasets
Offline Evaluation (Dominant Approach):
Static Split: Train/test split on historical data (snapshot view).
Dynamic/Longitudinal Split: Simulate dynamic system evolution over time.
Metrics: Combine accuracy metrics (NDCG, Recall) with bias-related metrics (Gini,
Coverage).
Online Evaluation (User Studies, A/B Tests):
A/B tests: Deploy debiasing methods in real-world systems and measure user behavior
(clicks, engagement).
User studies: Gather user perceptions, subjective feedback on debiased recommendations.
More resource-intensive but crucial for real-world validation.
Datasets
MovieLens, LastFM, BookCrossing etc. - Widely used
All exhibit skewed popularity distributions but vary in size, density, and bias levels.
Trust & Responsibility in Recommendation Systems Workshop WSDM 2025
12. Challenges in Addressing Popularity Bias
Accuracy vs. fairness trade-off
Reducing popularity can come at a cost of user experience (especially short term)
Careful tuning of parameters to manage tradeoff is critical
Defining fairness goals
what a fair distribution of recommendations remains unclear ?
Measurement
Pre-test vs post launch inconsistency in metrics because of feedback loop based training data
Lack of multi-stakeholder evaluation of recommender systems in A/B tests.
Lack of measurement of long term metrics e.g retention vs short term metrics e.g clicks and watch time
Trust & Responsibility in Recommendation Systems Workshop WSDM 2025
14. References
[1] Abdollahpouri, H., Mansoury, M.: Multi-sided exposure bias in recommendation. In: Proceedings of the International Workshop on Industrial
Recommendation Systems in conjunction with ACM KDD 2020 (2020)
[2]Banerjee, A., Patro, G.K., Dietz, L.W., Chakraborty, A.: Analyzing near me services: potential for exposure bias in location-based retrieval.
In: 2020 IEEE International Conference on Big Data, pp. 36423651(2020)
[3]Boratto, L., Fenu, G., Marras, M.: Connecting user and item perspectives in popularity debiasing for collaborative recommendation. Inf.
Process. Manag. 58(1), 102387 (2021)
[4]Channamsetty, S., Ekstrand, M.D.: Recommender response to diversity and popularity bias in user profiles.In: Proceedings of the 13th
International FLAIRS Conference, pp. 657660 (2017)
[5] Chen, J., Dong, H., Wang, X., Feng, F., Wang, M., He, X.: Bias and debias in recommender system: a survey and future directions. ACM
Trans. Inf. Syst. 31, 139 (2020)
[6] Deldjoo, Y., Bellogin, A., Di Noia, T.: Explaining recommender systems fairness and accuracy through the lens of data characteristics. Inf.
Process. Manag. 58(5), 102662 (2021)
[7] Yalcin, E., Bilge, A.: Investigating and counteracting popularity bias in group recommendations. Inf. Pro-cess. Manag. 58(5), 102608 (2021)
[8] Yang, Y., Huang, C., Xia, L., Huang, C., Luo, D., Lin, K.: Debiased contrastive learning for sequential recommendation. In: Proceedings of
the ACM Web Conference 2023, WWW 23, pp. 10631073 (2023b)
[9] Zanon, A.L., da Rocha, L.C.D., Manzato, M.G.: Balancing the trade-off between accuracy and diversity in recommender systems with
personalized explanations based on linked open data. Knowl. Based Syst. 252, 109333 (2022)