Hybrid Image Search at Scale: Lessons in Accuracy, Latency, and Cost

9:15 - François Gaillard, Adeo Services & Guilherme de Freitas Guitte, Adeo Services

they use elastic (in GCP)
they use a multimodel approach, image encoding and text encoding
they use vertexAI for image embeddings
idea is to query with an image (e.g. of adapters and utils for gardeing) and to find the matching product
they evaluated lexical vs a hybrid approach with knn search
their catalog comprises around 10M docs
due to images they have volumes up to 207gb disk storage
they use BBQ = better binary quanitzation to tackle downscaling of images (to reduce disk storage volume)
- bringing it from 207 to 53 GB
- BBQ is very effective, bc they almost lose no relevance while slimming storage space

Their advice on vector search to reduce search latency: reduce embedding dimension

Women of Search Present: AI Agents: From Hype to Reality

10:00 - Angeley Mullins, Aetheris Ventures & Olena Gorbatiuk, Independent & Atita Arora, Voyager Search

Companies like Klarna and DuoLingo messed up by trying te replace CS/ workers with AI agents
"Human authenticity will become the new luxury"

"AI agents need search" - but that had goods and bads
they advocate that search for AI needs a different tuning than search for humans
Ai search will:
- perform way more searches
- do longer and more detailed queries
- the evaluation is rather: was the task completed successfully?
- its optimized for recall, context and data openness
From LLM-as-a-Judge to Human-in-the-Loop: Rethinking Evaluation in RAG and Search

11:00 - Fernando Rejon Barrera, Zeta Alpha & Daniel Wrigley, OpenSource Connections

this is about eyeballing (a HIL approach):
- it's part of the relevance workbench in OS
- we utilize 2 search configurations, where we can compare two sets of queries to see the differences in the result sets
  - we then see the unique and the common results in the two sets

the idea is to combine eyeballing and RAG evaluation

with the current approach, we query an LLM, which does a (vector) search and generates an answer
we could optimize the prompts and the (vector) search retrieval
they're leveraging pairwise comparison on results for LLMaaJ evaluation
an idea is to use the ELO ranking system, like in chess torunaments
- they run a "torunament between all agents"
introduced and demoed "Ragelo"
- currently a simple streamlit app, where one can select agent configurations, benchmarks and then can collect answers
one can then evaluate pair- or pointwise

Beyond Keywords: Measuring Multimodal Search Quality

11:45 - Philippe Bouzaglou, Vectra

eyeballing ebmeddings is not possible, but embedding models are important for search relevance in vector search
though they are hard to evaluate
- we usually just google which ever model make most "sense" (is newest, performs best on benchmarks)
- current method to evaluate: install model => embed docs => run queries => measure and => repeat
  - that's good but slow
they propose a new method "document-query similarity evaluation"
flips the eval upside down
we take one document, get many queries and then get the cosine similarity scores between each query and the document
then we rank the queries by similarity and then we get an LLM judge (or human) to rank the queries
then we compare these
we could do this with a lot of models to see their differences

Future-Proofing E-commerce Search Architecture for Conversational Commerce and Beyond

13:30 - Jens Kürsten, OTTO GmbH & Co. KGaA

they use a hybrid approach of lexical and semantic search
they do a facet selection (classical filters) and then do rank fusion
functionally, they care about latency, maintainability, infra cost, etc.
they also did a gradual rollout
- started with 0 result searches, then also did high volume queries, then a low recall until finally going for high recall
- experimented a lot with different rank fusions, with only serving specific pages, only offering partial filters, etc.
- they can manage a latency with is around 600ms
OTTOs stack doesn't look that different from ours (it's just way more mature)
otto uses solr with tweaks for lexical search

"Search stack better be:
- API accessible
- lightning fast
- dead cheap
- scalable to infinity"
- they also do LLMaaJ:
  - they're simulating a customer

they reduced infra cost by 25%

His takeaways are:
- Hybrid sesarch is hard, but worth the money"
- "Make it work, make it right, make it fast", applicable at scale

There are a couple of teams working on search

Smart Recall: Enhancing Local LLM Conversations with Embedding-Aware Context Retrieval

14:15 - Lucas Jeanniot, Eliatra

business reality sometimes might bring you into the situation of having to host LLMs yourself locally for privacy
challenges:
- no persistent memory
- context window filling up
- expensive redundant searches

they leverage OpenSearch conversational interface (he gave examples in ptyhon)
focus on token management, so they're tracking the tokens (it's local, so we aim to keep it slim)
- they have different strategies like remove examples, summarize middle, keep recent only, aggressive summary
the token limits they have are set to all generations

they encourage to implement fallbacks
they use a three layer cache strategy, for embedding-, search result- and conversation summary cache

Commoditizing Inference: Why Your Query Language Should Speak AI

15:15 - Aurélien Foucret, Elastic

(elastic also offers an MCP server)
they propose an inference API
- it'll be agnostic fo different llm provider formats
  - currently inference runs on the ML nodes => freom now on it's no longer tied to the ML nodes of the cluster
  - also has better throughput
  - currently only available on one AWS region (as it's in tech preview)
they'll introduce a new elastic inference service (EIS)
- ESQL gets new inference parameters

Hybrid search Lessons learned

16:00 - Tom Burgmans, Wolters Kluwer & Mohit Sidana, Wolters Kluwer

Wolters Kluwer is an information serbice provider for experts in lega, business tax, etc.
they use solar for lexical search
they also use matryoshka embeddings with scalar quantization
they also struggled with throughput from 3rd party embedding models
they optimized indexing an tried to re-use vectors
- if core text is not changing; fort that they check if metadata was changed
they sum the scores of vector search & lexical search
"Blending lexical results with vector results is like mixing oil and water“
- keyword matching is an exact match
- embeddings are an approximation
- problematic of hybrid: you can do boosting on the whole lexical result set, with vector search you can only apply boosting to the top k

they base the balance of hybrid to lexical search depending on the type of the query:
e.g. for citations they do lexical, for case nickname, they blends some vector results in, etc.
for case summaries they rely on vector search, for typos as well, but they blend in some lexical results

their query understanding also considers history and context info
lots of challenges they face seem very similar to ours

vector search comes at a price -> one should definitely check how big the relevancy benefit is
cool idea they had: building a prototype interface that (with colors) shows what the origin of a search result is (hybrid vs lexical vs both) -> they then also implemented a slider that then showed the impact of the specific searched

Hybrid search sometimes also leads to unexpected situations (e.g. having a filter that shows aggregations, and the agg number might change when filter is applied, bc then the vector search might show up more documents than before)

Key Conference Findings

Research agentic relevance framework => this might have big impact in the future
Discuss interleaved A/B testing => this could reduce the A/B test times a lot
When we're live we should increase our focus on query understanding

Last updated: March 31, 2026 at 14:51

By: Alexandra Klochko

📄 View source

Repository: PIT-Search-API/universal-search

Hybrid Image Search at Scale: Lessons in Accuracy, Latency, and Cost

9:15 - François Gaillard, Adeo Services & Guilherme de Freitas Guitte, Adeo Services

Women of Search Present: AI Agents: From Hype to Reality

10:00 - Angeley Mullins, Aetheris Ventures & Olena Gorbatiuk, Independent & Atita Arora, Voyager Search

From LLM-as-a-Judge to Human-in-the-Loop: Rethinking Evaluation in RAG and Search

11:00 - Fernando Rejon Barrera, Zeta Alpha & Daniel Wrigley, OpenSource Connections

Beyond Keywords: Measuring Multimodal Search Quality

11:45 - Philippe Bouzaglou, Vectra

Future-Proofing E-commerce Search Architecture for Conversational Commerce and Beyond

13:30 - Jens Kürsten, OTTO GmbH & Co. KGaA

Smart Recall: Enhancing Local LLM Conversations with Embedding-Aware Context Retrieval

14:15 - Lucas Jeanniot, Eliatra

Commoditizing Inference: Why Your Query Language Should Speak AI

15:15 - Aurélien Foucret, Elastic

Hybrid search Lessons learned

16:00 - Tom Burgmans, Wolters Kluwer & Mohit Sidana, Wolters Kluwer

Key Conference Findings