Zur Hauptnavigation wechseln Zur Suche wechseln Zum Hauptinhalt wechseln

Causal Probing for Dual Encoders

Publikation: Beitrag in Buch/Bericht/Sammelwerk/KonferenzbandAufsatz in KonferenzbandForschungPeer-Review

Abstract

Dual encoders are highly effective and widely deployed in the retrieval phase for passage and document ranking, question answering, or retrieval-augmented generation (RAG) setups. Most dual-encoder models use transformer models like BERT to map input queries and output targets to a common vector space encoding the semantic similarity. Despite their prevalence and impressive performance, little is known about the inner workings of dense encoders for retrieval. We investigate neural retrievers using the probing paradigm to identify well-understood IR properties that causally result in ranking performance. Unlike existing works that have probed cross-encoders to show query-document interactions, we provide a principled approach to probe dual-encoders. Importantly, we employ causal probing to avoid correlation effects that might be artefacts of vanilla probing. We conduct extensive experiments on one such dual encoder (TCT-ColBERT) to check for the existence and relevance of six properties: term importance, lexical matching (BM25), semantic matching, question classification, and the two linguistic properties of named entity recognition and coreference resolution. Our layer-wise analysis shows important differences between re-rankers and dual encoders, establishing which tasks are not only understood by the model but also used for inference.

OriginalspracheEnglisch
Titel des SammelwerksCIKM 2024
UntertitelProceedings of the 33rd ACM International Conference on Information and Knowledge Management
Herausgeber (Verlag)Association for Computing Machinery
Seiten2292-2303
Seitenumfang12
ISBN (elektronisch)9798400704369
DOIs
PublikationsstatusVeröffentlicht - 21 Okt. 2024
Veranstaltung33rd ACM International Conference on Information and Knowledge Management, CIKM 2024 - Boise, USA / Vereinigte Staaten
Dauer: 21 Okt. 202425 Okt. 2024

Konferenz

Konferenz33rd ACM International Conference on Information and Knowledge Management, CIKM 2024
Land/GebietUSA / Vereinigte Staaten
OrtBoise
Zeitraum21 Okt. 202425 Okt. 2024

ASJC Scopus Sachgebiete

  • Allgemeine Unternehmensführung und Buchhaltung
  • Allgemeine Entscheidungswissenschaften

Dieses zitieren