About me

I am an information retrieval researcher based in Montreal, Canada. I develop machine learning models and benchmarks for information retrieval, and study the societal implications of information access technologies. I am currently serving as the ACM SIGIR Secretary and as an Associate Editor for the ACM Transactions on Information System (TOIS) journal. I previously served as the ACM SIGIR Community Relations Coordinator (2022-2025). I have received several awards for my research, including two ACM SIGIR Early Career Researcher Awards (2024) for excellence in research and in community engagement. I received my Ph.D. in Computer Science from University College London under the supervision of Dr. Emine Yilmaz.

Featured work

Book Chapter: Sociotechnical Implications of Generative Artificial Intelligence for Information Access

Robust access to trustworthy information is a critical need for society with implications for knowledge production, public health education, and promoting informed citizenry in democratic societies. Generative AI technologies may enable new ways to access information and improve effectiveness of existing information retrieval systems, but we are only starting to understand and grapple with their long-term social implications. In this chapter, we present an overview of some of the systemic consequences and risks of employing generative AI in the context of information access. We also provide recommendations for evaluation and mitigation and discuss challenges for future research.

Book: An Introduction to Neural Information Retrieval

Neural models have been employed in many Information Retrieval scenarios, including ad-hoc retrieval, recommender systems, multi-media search, and even conversational systems that generate answers in response to natural language questions. This book provides a complete picture of neural information retrieval techniques that culminate in supervised neural learning to rank models including deep neural network architectures that are trained end-to-end for ranking tasks. In reaching this point, we cover all the important topics, including the learning to rank framework and an overview of deep neural networks. We provide an accessible, yet comprehensive, overview of the state-of-the-art of Neural Information Retrieval.

Invited Talk: Emancipatory Information Retrieval

Our world today is facing a confluence of several mutually reinforcing crises each of which intersects with concerns of social justice and emancipation. This talk is a provocation for the role of computer-mediated information access in our emancipatory struggles. We define emancipatory information retrieval as the study and development of information access methods that challenge various forms of human oppression and situates its activities within broader collective emancipatory praxis. The term "emancipatory" here signifies the moral concerns of universal humanization of all peoples and the elimination of oppression to create the conditions under which we can collectively flourish. We present an early framework of practices, projects, and design provocations for emancipatory IR to challenge the field of IR research to embrace humanistic values and commit to universal emancipation and social justice.

Benchmark: TREC Tip-of-the-Tongue Track

Tip-of-the-tongue (ToT) known-item retrieval is defined as "an item identification task in which the searcher has previously experienced an item but cannot recall a reliable identifier" (i.e., "It’s on the tip of my tongue…"). The TREC ToT track aims to develop IR systems that can successfully resolve ToT information needs. Progress in this area will likely benefit other IR systems that must deal with memory assistance, such as personal information management (PIM) systems (e.g., email re-finding).

Benchmark: TREC Deep Learning Track

The TREC Deep Learning Track studies information retrieval in a large training data regime. This is the case where the number of training queries with at least one positive label is at least in the tens of thousands, if not hundreds of thousands or more. This corresponds to real-world scenarios such as training based on click logs and training based on labels from shallow pools (such as the pooling in the TREC Million Query Track or the evaluation of search engines based on early precision).

Benchmark: MS MARCO

MS MARCO is a collection of datasets focused on deep learning in search. The first dataset was a question answering dataset featuring 100,000 real Bing questions and a human generated answer. Since then we released a 1,000,000 question dataset, a natural langauge generation dataset, a passage ranking dataset, keyphrase extraction dataset, crawling dataset, and a conversational search.

Workshop: Neural Information Retrieval (Neu-IR)

Before deep learning gained prominence in the information retrieval community, the Neu-IR (pronounced "New IR") workshops brought together an early community of IR researchers interested in deep learning methods. It served as a forum for academic and industrial researchers working at the intersection of information retrieval and machine learning to present new work and early results, compare notes on neural network toolkits, share best practices, and discuss the main challenges facing this line of research.

PhD Thesis: Neural Methods for Effective, Efficient, and Exposure-Aware Information Retrieval

Neural networks with deep architectures have demonstrated significant performance improvements in computer vision, speech recognition, and natural language processing. The challenges in information retrieval (IR), however, are different from these other application areas. Effective IR systems must deal with query-document vocabulary mismatch problem, by modeling relationships between different query and document terms and how they indicate relevance. Efficient IR methods should take advantage of specialized IR data structures, such as inverted index, to efficiently retrieve from large collections. Exposure-aware IR systems may optimize for additional objectives, besides relevance, such as parity of exposure for retrieved items and content publishers. In this thesis, we present novel neural architectures and methods motivated by the specific needs and challenges of IR tasks.