Research Theme: Evaluation and Benchmarking
This research theme focuses on the design and develoment of new metrics, benchmarks, and approaches for evaluating information retrieval, AI, and other computing systems.
Keynotes, invited talks, and lectures
Shared task organization
Workshop organization
Publications
Judging the Judges: A Collection of LLM-Generated Relevance Judgements
Hossein A. Rahmani, Clemencia Siro, Mohammad Aliannejadi, Nick Craswell, Charles L. A. Clarke, Guglielmo Faggioli, Bhaskar Mitra, Paul Thomas, and Emine Yilmaz
Preprint, 2025
PDF | ArXivTowards Understanding Bias in Synthetic Data for Evaluation
Hossein A. Rahmani, Varsha Ramineni, Nick Craswell, Bhaskar Mitra, and Emine Yilmaz
In proc. ACM CIKM, 2025
Publication | PDF | ArXivLLM4Eval: Large Language Model for Evaluation in IR
Clemencia Siro, Hossein A. Rahmani, Mohammad Aliannejadi, Nick Craswell, Charles L. A. Clarke, Guglielmo Faggioli, Bhaskar Mitra, Paul Thomas, and Emine Yilmaz
In proc. ACM SIGIR, 2025
Publication | PDFTip of the Tongue Query Elicitation for Simulated Evaluation
Yifan He, To Eun Kim, Fernando Diaz, Jaime Arguello, and Bhaskar Mitra
In proc. ACM SIGIR, 2025
Publication | PDF | ArXivJudgeBlender: Ensembling Automatic Relevance Judgments
Hossein A. Rahmani, Emine Yilmaz, Nick Craswell, and Bhaskar Mitra
In proc. ACM TheWebConf, 2025
Publication | PDF | ArXivSynDL: A Large-Scale Synthetic Test Collection for Passage Retrieval
Hossein A. Rahmani, Xi Wang, Emine Yilmaz, Nick Craswell, Bhaskar Mitra, and Paul Thomas
In proc. ACM TheWebConf, 2025
Publication | PDF | ArXivLLM4Eval@WSDM 2025: Large Language Model for Evaluation in Information Retrieval
Hossein A. Rahmani, Clemencia Siro, Mohammad Aliannejadi, Nick Craswell, Charles L.A. Clarke, Guglielmo Faggioli, Bhaskar Mitra, Paul Thomas, and Emine Yilmaz
In proc. ACM WSDM, 2025
Publication | PDFRecall, Robustness, and Lexicographic Evaluation
Fernando Diaz, Michael D. Ekstrand, and Bhaskar Mitra
In ACM Transactions on Recommender Systems (TORS), 2025
Publication | PDF | ArXivOverview of the TREC 2024 Tip-of-the-Tongue Track
Jaime Arguello, Samarth Bhargav, Fernando Diaz, To Eun Kim, Yifan He, Evangelos Kanoulas, and Bhaskar Mitra
In proc. Text REtrieval Conference (TREC), 2025
Publication | PDFLLMJudge: LLMs for Relevance Judgments
Hossein A. Rahmani, Emine Yilmaz, Nick Craswell, Bhaskar Mitra, Paul Thomas, Charles L. A. Clarke, Mohammad Aliannejadi, Clemencia Siro, and Guglielmo Faggioli
In proc. LM4Eval: The First Workshop on Large Language Models for Evaluation in Information Retrieval, ACM SIGIR, 2024
Publication | PDF | ArXivProceedings of The First Workshop on Large Language Models for Evaluation in Information Retrieval (LLM4Eval 2024)
Clemencia Siro, Mohammad Aliannejadi, Hossein A. Rahmani, Nick Craswell, Charles L. A. Clarke, Guglielmo Faggioli, Bhaskar Mitra, Paul Thomas, and Emine Yilmaz
ProceedingsReport on the 1st Workshop on Large Language Model for Evaluation in Information Retrieval (LLM4Eval 2024) at SIGIR 2024
Hossein A. Rahmani, Clemencia Siro, Mohammad Aliannejadi, Nick Craswell, Charles L. A. Clarke, Guglielmo Faggioli, Bhaskar Mitra, Paul Thomas, and Emine Yilmaz
In ACM SIGIR Forum, 2024
Publication | PDF | ArXivLLM4Eval: Large Language Model for Evaluation in IR
Hossein A. Rahmani, Clemencia Siro, Mohammad Aliannejadi, Nick Craswell, Charles L. A. Clarke, Guglielmo Faggioli, Bhaskar Mitra, Paul Thomas, and Emine Yilmaz
In proc. ACM SIGIR, 2024
Publication | PDFSynthetic Test Collections for Retrieval Evaluation
Hossein A. Rahmani, Nick Craswell, Emine Yilmaz, Bhaskar Mitra, and Daniel Campos
In proc. ACM SIGIR, 2024
Publication | PDF | ArXivLarge Language Models can Accurately Predict Searcher Preferences
Paul Thomas, Seth Spielman, Nick Craswell, and Bhaskar Mitra
In proc. ACM SIGIR, 2024
Publication | PDF | ArXivTowards Group-aware Search Success
Haolun Wu, Bhaskar Mitra, and Nick Craswell
In proc. ACM ICTIR, 2024
Publication | PDF | ArXivLearning to Extract Structured Entities Using Language Models
Haolun Wu, Ye Yuan, Liana Mikaelyan, Alexander Meulemans, Xue Liu, James Hensman, and Bhaskar Mitra
In proc. EMNLP, 2024
Publication | PDF | ArXivOverview of the TREC 2023 Tip-of-the-Tongue Track
Jaime Arguello, Samarth Bhargav, Fernando Diaz, Evangelos Kanoulas, and Bhaskar Mitra
In proc. Text REtrieval Conference (TREC), 2024
Publication | PDFOverview of the TREC 2023 Deep Learning Track
Nick Craswell, Bhaskar Mitra, Emine Yilmaz, Hossein A. Rahmani, Daniel Campos, Jimmy Lin, Ellen M. Voorhees, and Ian Soboroff
In proc. Text REtrieval Conference (TREC), 2024
Publication | PDF | ArXivOverview of the TREC 2022 Deep Learning Track
Nick Craswell, Bhaskar Mitra, Emine Yilmaz, Daniel Campos, Jimmy Lin, Ellen M. Voorhees, and Ian Soboroff
In proc. Text REtrieval Conference (TREC), 2023
Publication | PDF | ArXivAre We There Yet? A Decision Framework for Replacing Term-Based Retrieval with Dense Retrieval Systems
Sebastian Hofstätter, Nick Craswell, Bhaskar Mitra, Hamed Zamani, and Allan Hanbury
Preprint, 2022
PDF | ArXivFostering Coopetition While Plugging Leaks: The Design and Implementation of the MS MARCO Leaderboards
Jimmy Lin, Daniel Campos, Nick Craswell, Bhaskar Mitra, and Emine Yilmaz
In proc. ACM SIGIR, 2022
Publication | PDFJoint Multisided Exposure Fairness for Recommendation
Haolun Wu, Bhaskar Mitra, Chen Ma, Fernando Diaz, and Xue Liu
In proc. ACM SIGIR, 2022
Publication | PDF | ArXivOverview of the TREC 2021 Deep Learning Track
Nick Craswell, Bhaskar Mitra, Emine Yilmaz, Daniel Campos, and Jimmy Lin
In proc. Text REtrieval Conference (TREC), 2022
Publication | PDF | ArXivMS MARCO Chameleons: Challenging the MS MARCO Leaderboard with Extremely Obstinate Queries
Negar Arabzadeh, Bhaskar Mitra, and Ebrahim Bagheri
In proc. ACM CIKM, 2021
Publication | PDFMS MARCO: Benchmarking Ranking Models in the Large-Data Regime
Nick Craswell, Bhaskar Mitra, Emine Yilmaz, Daniel Campos, and Jimmy Lin
In proc. ACM SIGIR, 2021
Publication | PDF | ArXivTREC Deep Learning Track: Reusable Test Collections in the Large Data Regime
Nick Craswell, Bhaskar Mitra, Emine Yilmaz, Daniel Campos, Ellen M. Voorhees, and Ian Soboroff
In proc. ACM SIGIR, 2021
Publication | PDF | ArXivSignificant Improvements over the State of the Art? A Case Study of the MS MARCO Document Ranking Leaderboard
Jimmy Lin, Daniel Campos, Nick Craswell, Bhaskar Mitra, and Emine Yilmaz
In proc. ACM SIGIR, 2021
Publication | PDF | ArXivTip of the Tongue Known-Item Retrieval: A Case Study in Movie Identification
Jaime Arguello, Adam Ferguson, Emery Fine, Bhaskar Mitra, Hamed Zamani, and Fernando Diaz
In proc. ACM CHIIR, 2021
Publication | PDF | ArXivNeural methods for effective, efficient, and exposure-aware information retrieval
Bhaskar Mitra
In ACM SIGIR Forum, 2021
Publication | PDFNeural Methods for Effective, Efficient, and Exposure-Aware Information Retrieval
Bhaskar Mitra
PhD thesis, University College London, 2021
Publication | PDF | ArXivOverview of the TREC 2020 Deep Learning Track
Nick Craswell, Bhaskar Mitra, Emine Yilmaz, and Daniel Campos
In proc. Text REtrieval Conference (TREC), 2021
Publication | PDF | ArXivEvaluating Stochastic Rankings with Expected Exposure
Fernando Diaz, Bhaskar Mitra, Michael D. Ekstrand, Asia J. Biega, and Ben Carterette
In proc. ACM CIKM, 2020
🏆 Best Long Research Paper Nominee
Publication | PDF | ArXivORCAS: 18 Million Clicked Query-Document Pairs for Analyzing Search
Nick Craswell, Daniel Campos, Bhaskar Mitra, Emine Yilmaz, and Bodo Billerbeck
In proc. ACM CIKM, 2020
Publication | PDF | ArXivOn the Reliability of Test Collections for Evaluating Systems of Different Types
Emine Yilmaz, Nick Craswell, Bhaskar Mitra, and Daniel Campos
In proc. ACM SIGIR, 2020
Publication | PDF | ArXivOverview of the TREC 2019 Deep Learning Track
Nick Craswell, Bhaskar Mitra, Emine Yilmaz, Daniel Campos, and Ellen M. Voorhees
In proc. Text REtrieval Conference (TREC), 2020
Publication | PDF | ArXivBenchmark for Complex Answer Retrieval
Federico Nanni, Bhaskar Mitra, Matt Magnusson, and Laura Dietz
In proc. ACM ICTIR, 2017
Publication | PDF | ArXivAn Eye-tracking Study of User Interactions with Query Auto Completion
Kajta Hofmann, Bhaskar Mitra, Filip Radlinski, and Milad Shokouhi
In proc. ACM CIKM, 2014
Publication | PDF