Keynote

9:15 - David Louis Hollembaek, Veeva Systems

worked for fastsearch, then microsoft (they got acquired)
he outlined a new ROI model coming due to AI agents, enabling one to automate not only a few steps in the task-chain, instead enabling to automate the vast majority of them
Cost-LLM = (Cost of problem - cost of solution) (take this with a grain of salt)
the barrier for in respect of the ROI for "to-solve-problems" is much lower now, due to more efficient automation by llms, etc.
LLM cost decreased massively
- David assumes the cost will decrease further

Public Qs:

"What (search) problems are stil worth being solved in times of GenAI?" -> his A: all of them :P

"Is security a hidden cost to AI?" -> one should always bring the security context with it -> he advocates to actively consider functional requirements

From BM25 to Mixture-of-Encoders: Evaluations for Next-Gen Search and Retrieval Systems

10:00 - Filip Makraduli, Superlinked

Superlinked aims to enhance retrieval quality
Filip states that single vector embeddings have limits
mentions error correction based on the hadamard matrix
- every row has a 50% diff

leveraging a variety of encoders like text, timeseries, locations, images, etc. for different types of data
all of those are aggregated, weighted, cross-encoded into one unified embedding

e.g. they encode roducts with their polularity and desc, enabling them to resolve "very popular pants", etc.

super linked proofed their approach on a new (self-made) benchmark
- it's based on the wayfair dataset
- build one their onw, bc they faced most benchmarks being only text-based
- they benchmarked against BM25, colbert

they're able to serve queries like "hotels around kurfürstendamm, for 5 people under 200$ per night, with a score of more than 4.8 stars

their repo is open, enabling the lookup of their encoders they're also intergrated into langchain

Billion-scale hybrid retrieval in a single query

11:00 Marek Galovic, TopK, Inc.

TopK is a document database with native multi-vector, keyword, and faceted search — all in a single, composable query.
Statement: "Vector databases are the wrong abstraction for retrieval"
they build their own query engine "reactor" with dense and sparse vectors combined with lexical search (bm25) and custom scoring + efficient filtering
they use S3 for storage, EC2 for compute (abstracted by kubernetes)
they use a write-ahead log for strong consistency
reactor (the query engine):
- they have arouter that routes to several (scalable) "executors", possibly O(100) nodes
- they never need to copy, as all data is in buffers (-> zero-copy)
- their queries get faster the more you filter -> kinda obvious, no?
the architecture (S3, etc.) leads to less cost in comparison to other vector searches
- (sometime in the talk he mentioned 4k per month roughly)

they have a decoupled reranker engine

they slighty outperform on nDCG@10 for several benchmarks; but improvment is rather small tbh
cool architecture nonetheless

My Q's:

What data format are they storing the data in?
- His A: own fileformat .bob (bunch of buffers)
- they had issues with parquet:
  - no point reads
  - requires decoding to access just one column
  - physically, all data is stored in bufferes
  - logically, rows are organized as columns, split into blocks

Audience Q's:

What's better CPU vs GPU for Topk?
They currently use CPUs with ARN in AWS, being cheap and sufficing their current needs

How we scaled an internal GenAI platform at Bayer to over 60.000 users supporting hundreds of use cases

11:45 - Hendrik Hogertz, Bayer

skipped due to no-show

René took over with an impro Fishbowl session

Agentic search tuning: faster and better

13:30 - Stavros Macrakis, OpenSearch @ AWS, Daniel Wrigley @ OSC

short recap on progress on search (embeddings, ann, etc.)
OS 3.1 is supposed to release "anyday now" - even in managed?

too much focus on precision with LTR and cross-encoders => doesn't address recall

he ran through the regular "manual" process (offline eva, a/b tests, etc.) => takes time

proposed automatic process - leveraging interleaved A/B tests
- HIL approach (- project is in the beginning)
We're having one agent orchestrator that'll steer several agents - taking one "complaint" sending it to their agents - an agent can be an hypothesis gen, an offline eval agent, an online testing agent or a deployment agent+

the agents leverage several tools according to their field, e.g. LLMaaJ, metrics, KPIs etc.

first we'd generate a hypo, checking in with the user for verification
then running offline eval => putting HIL with the resulting metric changes
if verified, we'll do online testing, once again checking with the user "online KPIs changed by X,Y", etc.
then you could potentially run your deploy CI/CD etc.

OS Team and OSC are currently kickstarting this work (small team is working on it)

Improving Relevance in RAG - Lessons learned from the LiveRAG challenge

14:15 - Matthias Krüger, OpenSource Connections

team did the LiveRAG challenge at SIGIR 2025
had a fix set of data based on commoncrawl (tayilored a bit)
used falcon3-10b model
used recall, precision and MRR
final eval via answer/ ground truth comparison

For retrieval they did a hybrid approach, evaluating recall@1k first, then other steps narrowing down the result list

did recoprocal rank fusion@1k, then reranking@100, then filtering => why not filtering first?

very simple bm25
MTEB retrieval task (they took snowflake artic embed)
used usearch library

Harnessing AI to strengthen trustworthy information

15:15 - Lucian Precup, Adelean

they developed all.site, a search page which lets you filter by source and also provides directl which source the retrieved document ix from -> what's the take?

used HyDe for query rewriting -> RAG from scratch webinar from langchain
- more like this clause in Lucene

Lightning talks

made no notes, wanted to enjoy the session

Last updated: March 31, 2026 at 14:51

By: Alexandra Klochko

📄 View source

Repository: PIT-Search-API/universal-search