ºÝºÝߣ

ºÝºÝߣShare a Scribd company logo
Luigi Fugaro
Unleashing the Power of
Vector Search in .NET
SPONSOR
Join at slido.com
#1041068
?
Click Present with Slido or install our Chrome extension to display joining instructions for participants
while presenting.
Agenda
¡ï The Data Balance
¡ï Turning Data Into Vectors
¡ï Enter ML Embeddings
¡ï Redis as Vector Database
¡ï Redis OM .NET for Vector and more
¡ï Demo ¨C Live Coding... dotnet run
The Data Balance
The Data Balance
Growth
IDC Report 2023 - https://www.box.com/resources/unstructured-data-paper
Around 80%
of the data generated by organizations is
Unstructured
The Data Balance
Growth
Unstructured
Quasi-Structured
Semi-Structured
Structured
No inherent structure/many degrees of
freedom ~ Docs, PDFs, images, audio, video
Erratic patterns/formats ~ Clickstreams
There's a discernible pattern ~ Spreadsheets /
XML / JSON
Schema/defined data model ~ Database
Data type
The Data Balance
How to deal with unstructured data?
Common approaches were labeling and tagging
There are labor intensive, subjective, and error-prone
The Data Balance
The Data Balance
The Data Balance
What are the common approaches to
deal with Unstructured Data?
? Click Present with Slido or install our Chrome extension to activate this poll while presenting.
Turning Data
into Vectors !!
Turning Data into Vectors
What is a Vector?
Numeric representation of something in N-dimensional space
using Floating Numbers
Can represent anything... entire documents, images, video, audio
Quantifies features or characteristics of the item
More importantly... they are comparable
Enter ML Embeddings
Using Machine Learning
for Feature Extraction
Enter ML Embeddings
Machine Learning/Deep Learning has leaped forward in the last decade
ML model outperform humans in many tasks nowadays
CV (Computer Vision) models excel at detection/classification
LLMs (LArge Language Models) have advanced exponentially
Enter ML Embeddings
Feature Engineering
Enter ML Embeddings
Automated Feature Engineering
ML model extract latent features
ML embeddings catch the gray areas between features
The process of generating the embeddings is vectorizing
What are Vector Embeddings?
? Click Present with Slido or install our Chrome extension to activate this poll while presenting.
Why Vectors?
Creating and Storing them
Vectors
Visually
Semantic Relationship Syntactic Relationship
Visually
https://jalammar.github.io/illustrated-word2vec
Vectors
¡°King¡±
[ 0.50451 , 0.68607 , -0.59517 , -0.022801, 0.60046 , -0.13498 , -0.08813 , 0.47377 , -0.61798 , -0.31012 , -0.076666, 1.493
, -0.034189, -0.98173 , 0.68229 , 0.81722 , -0.51874 , -0.31503 , -0.55809 , 0.66421 , 0.1961 , -0.13495 , -0.11476 , -0.30344
, 0.41177 , -2.223 , -1.0756 , -1.0783 , -0.34354 , 0.33505 , 1.9927 , -0.04234 , -0.64319 , 0.71125 , 0.49159 , 0.16754 , 0.34344
, -0.25663 , -0.8523 , 0.1661 , 0.40102 , 1.1685 , -1.0137 , -0.21585 , -0.15155 , 0.78321 , -0.91241 , -1.6106 , -0.64426 ,
-0.51042 ]
Visually
https://jalammar.github.io/illustrated-word2vec
Vectors
Visually
https://jalammar.github.io/illustrated-word2vec
Vectors
_
+
Vectors can be operated upon
https://jalammar.github.io/illustrated-word2vec
Vectors
Vectorizing
Generating Vector Embeddings
for your Data
Vectorizing
1. Choose an Embedding Method
2. Clean and preprocess the data as needed
3. Train/Refine the embedding model
4. Generate Embeddings
Vectorizing
Better Models, better Vectors
Embeddings can capture the semantics of complex data
Option #1: Use a pre-trained model
Option #2: train your models with custom data
Vector similarity is a downline tool to analyze embeddings
Vectorizing
Similarity Metrics
? Measure ¡°closeness¡± between vectors in multi-dimensional space
? Enable efficient similarity search in vector databases
? Improve relevance and precision of search results
Vectorizing
Similarity/Distance Metrics
Cosine Similarity
Vectorizing
VSS in Redis
Redis as Vector Database
Which are the algs to calculate
similarity/distance metrics?
? Click Present with Slido or install our Chrome extension to activate this poll while presenting.
VSS in Redis
VSS in Redis
Index and query vector data stored as BLOBs in Redis Hashes/JSON
3 distance metrics: Euclidean, Internal Product and Cosine
2 indexing methods: HNSW and Flat
Pre-filtering queries with GEO, TAG, TEXT or NUMERIC fields
Redis OM
How to implement the
solution
Redis OM
A Redis Framework
It¡¯s more than Object Mapping
Redis OM stands for Redis Object Mapping, a suite of libraries designed to
facilitate object-oriented interaction with Redis. It simplifies working with
Redis by abstracting direct commands into higher-level operations.
https://github.com/redis/redis-om-dotnet
Redis OM
More to come...
Redis OM
Why?
¡ñ Redis OM simplifies application
development by abstracting Redis'
complexity, increasing productivity, and
enhancing code readability.
What?
¡ô Redis OM enables building real-time
applications, supporting:
¡ñ Model Data
¡ñ Perform CRUD Operations
¡ñ Index and Search Data
Redis OM
Redis OM
dotnet run --project ProductCatalog
Redis OM
Fashion Product Finder
Breakdown
¡ö A Product domain mapped to Redis JSON
¡ö [Indexed] decorated field for text and numeric indexing
¡ö [ImageVectorizer] decorated field to generate embeddings
¡ö [SentenceVectorizer] decorated field to generate embeddings
¡ö Pre-filter the search if needed
¡ö Entity Streams to query for K nearest neighbors
¡ö Display results
Demo¡­ plan B ?
namespace ProductCatalog.Model;
[Document(StorageType = StorageType.Json)]
public class Product
{
// Other fields...
}
Model ¡ú Redis JSON
[Document]
Demo¡­ plan B ?
Make Product Searchable ¡ú [Indexed] decoration
namespace ProductCatalog.Model;
[Document(StorageType = StorageType.Json)]
public class Product
{
[RedisIdField] [Indexed] public int Id { get; set; }
// Other fields...
[Indexed] public string Gender { get; set; }
[Indexed] public int? Year { get; set; }
// Other fields...
}
Decorate what
you want to
make searchable
Demo¡­ plan B ?
Automatic Embedding Generation
[IVectorizer] decoration
namespace ProductCatalog.Model;
[Document(StorageType = StorageType.Json)]
public class Product
{
// Other fields...
[Indexed(Algorithm = VectorAlgorithm.HNSW, DistanceMetric = DistanceMetric.COSINE)]
[ImageVectorizer]public Vector<string> ImageUrl { get; set; }
[Indexed(Algorithm = VectorAlgorithm.FLAT, DistanceMetric = DistanceMetric.COSINE)]
[SentenceVectorizer] public Vector<string> ProductDisplayName { get; set; }
Demo¡­ plan B ?
Searching with Fluent API
[IVectorizer] decoration
[HttpGet("byImage")]
public IEnumerable<CatalogResponse > ByImage([FromQuery]string url)
{
var collection = _provider.RedisCollection <Product>();
var response = collection.NearestNeighbors(x => x.ImageUrl, 15, url);
return response.Select(CatalogResponse .Of);
}
[HttpGet("byDescription" )]
public IEnumerable<CatalogResponse > ByDescription([FromQuery] string description)
{
var collection = _provider.RedisCollection <Product>();
var response = collection.NearestNeighbors(x => x.ProductDisplayName, 15, description);
return response.Select(CatalogResponse .Of);
}
Demo¡­ plan B ?
Redis OM
The tools and techniques to unlock
the value in Unstructured Data have
evolved greatly...
Redis OM
Databases like Redis and frameworks
like Redis OM can help!
Redis OM
INTEGRATIONS
FEATURES
Storage: HASH | JSON
Indexing: HNSW (ANN) | Flat (KNN)
Distance: L2 | Cosine | IP
Search Types: KNN/ANN | Hybrid |
Range | Full Text
Management: Realtime CRUD
operations, aliasing, temp indices, and
more
Ecosystem integrations
NEW REDIS ENTERPRISE 7.2
FEATURE
Scalable search and query for
improved performance, up to 16X
compared to previous versions
Redis as Vector Database
Vector Similarity Search
Use Cases
VSS Use cases
Vector Similarity Search Use Cases
Question & Answering
VSS Use cases
Vector Similarity Search Use Cases
Context retrieval for Retrieval Augmented Generation (RAG)
Pairing Redis Enterprise with Large Language
Models (LLM) such as OpenAI's ChatGPT, you can
give the LLM access to external contextual
knowledge.
? Enables more accurate answers and
prevents model 'hallucinations'.
? An LLM combines text fragments in a (most
often) semantically correct way.
VSS Use cases
Vector Similarity Search Use Cases
LLM Conversion Memory
The idea is to improve the model quality and
personalization through an adaptive memory.
? Persist all conversation history (memories)
as embeddings in a vector database.
? A conversational agent checks for relevant
memories to aid or personalize the LLM
behaviour.
? Allows users to change topics without
misunderstandings seamlessly.
VSS Use cases
Vector Similarity Search Use Cases
Semantic Caching
Because LLM completions are expensive, it helps
to reduce the overall costs of the ML-powered
application.
? Use vector database to cache input
prompts.
? Cache hits evaluated by semantic similarity.
VSS Use cases
Redis resources
Additional resources for learning about Redis
Central place to find
example apps that are
built on Redis
launchpad.redis.com
Redis Launchpad
Free online courses
taught by Redis experts
university.redis.com
Redis University
Create a database
Code your application
Explore your data
developer.redis.com
Developers Portal
Professional certification
program for developers
university.redis.com/
certification
Redis Certification
Redis resources
Which are the actual programing
languages supported by Redis OM?
? Click Present with Slido or install our Chrome extension to activate this poll while presenting.
.NET Conference 2024
Grazie
Questions?

More Related Content

Unleashing the Power of Vector Search in .NET - DotNETConf2024.pdf

  • 1. Luigi Fugaro Unleashing the Power of Vector Search in .NET
  • 3. Join at slido.com #1041068 ? Click Present with Slido or install our Chrome extension to display joining instructions for participants while presenting.
  • 4. Agenda ¡ï The Data Balance ¡ï Turning Data Into Vectors ¡ï Enter ML Embeddings ¡ï Redis as Vector Database ¡ï Redis OM .NET for Vector and more ¡ï Demo ¨C Live Coding... dotnet run
  • 6. The Data Balance Growth IDC Report 2023 - https://www.box.com/resources/unstructured-data-paper Around 80% of the data generated by organizations is Unstructured
  • 7. The Data Balance Growth Unstructured Quasi-Structured Semi-Structured Structured No inherent structure/many degrees of freedom ~ Docs, PDFs, images, audio, video Erratic patterns/formats ~ Clickstreams There's a discernible pattern ~ Spreadsheets / XML / JSON Schema/defined data model ~ Database Data type
  • 8. The Data Balance How to deal with unstructured data? Common approaches were labeling and tagging There are labor intensive, subjective, and error-prone
  • 12. What are the common approaches to deal with Unstructured Data? ? Click Present with Slido or install our Chrome extension to activate this poll while presenting.
  • 14. Turning Data into Vectors What is a Vector? Numeric representation of something in N-dimensional space using Floating Numbers Can represent anything... entire documents, images, video, audio Quantifies features or characteristics of the item More importantly... they are comparable
  • 15. Enter ML Embeddings Using Machine Learning for Feature Extraction
  • 16. Enter ML Embeddings Machine Learning/Deep Learning has leaped forward in the last decade ML model outperform humans in many tasks nowadays CV (Computer Vision) models excel at detection/classification LLMs (LArge Language Models) have advanced exponentially
  • 18. Enter ML Embeddings Automated Feature Engineering ML model extract latent features ML embeddings catch the gray areas between features The process of generating the embeddings is vectorizing
  • 19. What are Vector Embeddings? ? Click Present with Slido or install our Chrome extension to activate this poll while presenting.
  • 22. Visually https://jalammar.github.io/illustrated-word2vec Vectors ¡°King¡± [ 0.50451 , 0.68607 , -0.59517 , -0.022801, 0.60046 , -0.13498 , -0.08813 , 0.47377 , -0.61798 , -0.31012 , -0.076666, 1.493 , -0.034189, -0.98173 , 0.68229 , 0.81722 , -0.51874 , -0.31503 , -0.55809 , 0.66421 , 0.1961 , -0.13495 , -0.11476 , -0.30344 , 0.41177 , -2.223 , -1.0756 , -1.0783 , -0.34354 , 0.33505 , 1.9927 , -0.04234 , -0.64319 , 0.71125 , 0.49159 , 0.16754 , 0.34344 , -0.25663 , -0.8523 , 0.1661 , 0.40102 , 1.1685 , -1.0137 , -0.21585 , -0.15155 , 0.78321 , -0.91241 , -1.6106 , -0.64426 , -0.51042 ]
  • 25. Vectors can be operated upon https://jalammar.github.io/illustrated-word2vec Vectors
  • 27. Vectorizing 1. Choose an Embedding Method 2. Clean and preprocess the data as needed 3. Train/Refine the embedding model 4. Generate Embeddings
  • 28. Vectorizing Better Models, better Vectors Embeddings can capture the semantics of complex data Option #1: Use a pre-trained model Option #2: train your models with custom data Vector similarity is a downline tool to analyze embeddings
  • 29. Vectorizing Similarity Metrics ? Measure ¡°closeness¡± between vectors in multi-dimensional space ? Enable efficient similarity search in vector databases ? Improve relevance and precision of search results
  • 32. VSS in Redis Redis as Vector Database
  • 33. Which are the algs to calculate similarity/distance metrics? ? Click Present with Slido or install our Chrome extension to activate this poll while presenting.
  • 35. VSS in Redis Index and query vector data stored as BLOBs in Redis Hashes/JSON 3 distance metrics: Euclidean, Internal Product and Cosine 2 indexing methods: HNSW and Flat Pre-filtering queries with GEO, TAG, TEXT or NUMERIC fields
  • 36. Redis OM How to implement the solution
  • 38. A Redis Framework It¡¯s more than Object Mapping Redis OM stands for Redis Object Mapping, a suite of libraries designed to facilitate object-oriented interaction with Redis. It simplifies working with Redis by abstracting direct commands into higher-level operations. https://github.com/redis/redis-om-dotnet Redis OM
  • 40. Why? ¡ñ Redis OM simplifies application development by abstracting Redis' complexity, increasing productivity, and enhancing code readability. What? ¡ô Redis OM enables building real-time applications, supporting: ¡ñ Model Data ¡ñ Perform CRUD Operations ¡ñ Index and Search Data Redis OM
  • 41. Redis OM dotnet run --project ProductCatalog
  • 42. Redis OM Fashion Product Finder Breakdown ¡ö A Product domain mapped to Redis JSON ¡ö [Indexed] decorated field for text and numeric indexing ¡ö [ImageVectorizer] decorated field to generate embeddings ¡ö [SentenceVectorizer] decorated field to generate embeddings ¡ö Pre-filter the search if needed ¡ö Entity Streams to query for K nearest neighbors ¡ö Display results
  • 43. Demo¡­ plan B ? namespace ProductCatalog.Model; [Document(StorageType = StorageType.Json)] public class Product { // Other fields... } Model ¡ú Redis JSON [Document]
  • 44. Demo¡­ plan B ? Make Product Searchable ¡ú [Indexed] decoration namespace ProductCatalog.Model; [Document(StorageType = StorageType.Json)] public class Product { [RedisIdField] [Indexed] public int Id { get; set; } // Other fields... [Indexed] public string Gender { get; set; } [Indexed] public int? Year { get; set; } // Other fields... } Decorate what you want to make searchable
  • 45. Demo¡­ plan B ? Automatic Embedding Generation [IVectorizer] decoration namespace ProductCatalog.Model; [Document(StorageType = StorageType.Json)] public class Product { // Other fields... [Indexed(Algorithm = VectorAlgorithm.HNSW, DistanceMetric = DistanceMetric.COSINE)] [ImageVectorizer]public Vector<string> ImageUrl { get; set; } [Indexed(Algorithm = VectorAlgorithm.FLAT, DistanceMetric = DistanceMetric.COSINE)] [SentenceVectorizer] public Vector<string> ProductDisplayName { get; set; }
  • 46. Demo¡­ plan B ? Searching with Fluent API [IVectorizer] decoration [HttpGet("byImage")] public IEnumerable<CatalogResponse > ByImage([FromQuery]string url) { var collection = _provider.RedisCollection <Product>(); var response = collection.NearestNeighbors(x => x.ImageUrl, 15, url); return response.Select(CatalogResponse .Of); } [HttpGet("byDescription" )] public IEnumerable<CatalogResponse > ByDescription([FromQuery] string description) { var collection = _provider.RedisCollection <Product>(); var response = collection.NearestNeighbors(x => x.ProductDisplayName, 15, description); return response.Select(CatalogResponse .Of); }
  • 48. Redis OM The tools and techniques to unlock the value in Unstructured Data have evolved greatly...
  • 49. Redis OM Databases like Redis and frameworks like Redis OM can help!
  • 50. Redis OM INTEGRATIONS FEATURES Storage: HASH | JSON Indexing: HNSW (ANN) | Flat (KNN) Distance: L2 | Cosine | IP Search Types: KNN/ANN | Hybrid | Range | Full Text Management: Realtime CRUD operations, aliasing, temp indices, and more Ecosystem integrations NEW REDIS ENTERPRISE 7.2 FEATURE Scalable search and query for improved performance, up to 16X compared to previous versions Redis as Vector Database
  • 52. VSS Use cases Vector Similarity Search Use Cases Question & Answering
  • 53. VSS Use cases Vector Similarity Search Use Cases Context retrieval for Retrieval Augmented Generation (RAG) Pairing Redis Enterprise with Large Language Models (LLM) such as OpenAI's ChatGPT, you can give the LLM access to external contextual knowledge. ? Enables more accurate answers and prevents model 'hallucinations'. ? An LLM combines text fragments in a (most often) semantically correct way.
  • 54. VSS Use cases Vector Similarity Search Use Cases LLM Conversion Memory The idea is to improve the model quality and personalization through an adaptive memory. ? Persist all conversation history (memories) as embeddings in a vector database. ? A conversational agent checks for relevant memories to aid or personalize the LLM behaviour. ? Allows users to change topics without misunderstandings seamlessly.
  • 55. VSS Use cases Vector Similarity Search Use Cases Semantic Caching Because LLM completions are expensive, it helps to reduce the overall costs of the ML-powered application. ? Use vector database to cache input prompts. ? Cache hits evaluated by semantic similarity.
  • 57. Redis resources Additional resources for learning about Redis Central place to find example apps that are built on Redis launchpad.redis.com Redis Launchpad Free online courses taught by Redis experts university.redis.com Redis University Create a database Code your application Explore your data developer.redis.com Developers Portal Professional certification program for developers university.redis.com/ certification Redis Certification
  • 59. Which are the actual programing languages supported by Redis OM? ? Click Present with Slido or install our Chrome extension to activate this poll while presenting.