How We Built Vector Search in the Cloud
Wednesday, December 6th at 10am PT/1 pm ET
Rockset fully integrates similarity indexing into its search and analytics database enabling engineers to scale AI applications to thousands of users.
In this talk, Chief Architect Tudor Bosman and engineer Daniel Latta-Lin share how they built a distributed similarity index using FAISS-IVF that is memory-efficient and supports immediate insertion and recall. They delve into the implementation details including how Rockset supports:
- Real-time updates: Rockset supports inserts, updates and deletes of vectors and metadata. It’s built on RocksDB, an open-source embedded storage engine designed for mutability. When a vector is inserted or modified, Rockset computes its Voronoi cell using FAISS and then adds or updates the closest centroid and residual value to the search index. New data is reflected in searches in milliseconds.
- Hybrid search with SQL: Rockset stores and indexes vectors alongside text, JSON and time series data. It leverages both the search index and the similarity index in parallel. Using FAISS, the K nearest centroids to the target vector are identified. Results are filtered by the K nearest centroids and metadata terms using the search index, a concept known as single-stage filtering.
- Separation of indexing and search: With compute-compute separation, similarity indexing of vectors will not affect search performance. Ingestion and indexing happen on different virtual instances (clusters) than search for predictable performance as you scale.