Since the rise of ChatGPT, the general public has realized that generative artificial intelligence (GenAI) could potentially transform our lives. The availability of large language models (LLMs) has also changed how developers build AI-powered applications and has led to the emergence of various new developer tools. Although vector databases have been around long before ChatGPT, they have become an integral part of the GenAI technology stack, as vector databases can address some of LLMs’ key limitations, such as hallucinations and lack of long-term memory
This article first introduces vector databases and their use cases. Next, you will learn more about how vector databases are designed to help developers get started with building GenAI applications quickly. As a developer advocate at Weaviate, an open-source vector database, I will use Weaviate to demonstrate relevant concepts as we go along. In the final discussion, you will learn how they can address the challenges enterprises face when moving these prototypes to production.
What are vector databases?
Vector databases store and provide access to structured and unstructured data such as text and images. vector embedding. A vector embedding is a numerical representation of data as a long list of numbers that captures the semantic meaning of the original data object. Machine learning models are typically used to generate vector embeddings.
Similar objects are close to each other in vector space, so the similarity of data objects can be calculated based on the distance between vector embeddings of the data objects. This opens the door to a new type of search technique called . vector search Get objects based on similarity. In contrast to traditional keyword-based search, semantic search provides a more flexible way to search for items.
Many traditional databases support storing vector embeddings to enable vector search, but vector databases are AI-native. That is, it is optimized to perform large-scale fast vector searches. Vector search requires calculating the distance between the search query and all data objects, making traditional K-nearest neighbor algorithms computationally expensive. The vector database uses vector indexing Precompute distances for faster retrieval at query time. Therefore, vector databases allow users to quickly search and retrieve similar objects at scale in production environments.
Use cases of vector databases
Traditionally, vector databases have been used in a variety of applications in the search domain. However, with the rise of ChatGPT, it has become more apparent that vector databases can enhance the capabilities of LLM.
Natural-language search
Traditionally, vector databases were used to unlock. Natural language search. These enable semantic search to be robust against a variety of terms and typos. Vector searches can be performed on any modality: images, video, audio, or a combination thereof. This enables a variety of powerful use cases for vector databases, even when traditional databases simply aren’t available.
For example, vector databases are used in recommendation systems as a special use case for search. Stack Overflow also recently showed how you can use Weaviate to improve your customer experience with better search results.
Enhancing LLM capabilities
With the rise of LLM, vector databases have shown that it is possible to: Enhanced LLM capabilities Functions as external memory. For example, businesses use customized chatbots as a first line of customer support or as technical or financial assistants to improve the customer experience. But for conversational AI to be successful, it must meet three criteria:
- Human language/reasoning needs to be generated.
- To have a proper conversation, you need to remember what was said before.
- You need to be able to query factual information beyond general knowledge.
A general-purpose LLM can cover the first criterion, but must support the other two criteria. This is where vector databases come into play.
- Specifies the state of the LLM. LLM is stateless. In other words, once an LLM is trained, its knowledge is frozen. You can fine-tune the LLM to get more information and expand your knowledge, but once the fine-tuning is complete, the LLM will freeze again. Vector databases allow you to easily create and update information, allowing you to effectively provide state to your LLM.
- Acts as an external knowledge database. Like GPT-3, LLM produces confident answers regardless of factual accuracy. In particular, “hallucinations” can set in when you move outside of general knowledge and into domain-specific areas where relevant facts may not have been part of the training data (where LLM is counterfactually inaccurate). (phenomenon that generates answers). To deal with hallucinations, you can use a vector search engine to retrieve relevant factual knowledge and pipe it into the LLM’s context window. This technique is known as search augmentation generation (RAG) and helps LLM generate factually accurate results.
Prototyping with vector databases
The ability to quickly create prototypes is important not only in a hackathon environment, but also in a fast-paced environment to test new ideas and lead to faster decision-making. As an integral part of the technology stack, vector databases should help accelerate the development of GenAI applications. This section describes how developers can use vector databases to perform rapid prototyping by handling setup, vectorization, search, and results.
For this example, we’ll use Weaviate because it’s easy to get started and only requires a few lines of code (not to mention something we’re familiar with).
Easy setup
To enable rapid prototyping, vector databases are typically easy to set up with a few lines of code. In this example, the setup consists of connecting his Weaviate client to a Vector database instance. If you use embedded models or LLM from providers such as OpenAI, Cohere, or Hugging Face, specify the API key in this step to enable their integration.
import weaviate
client = weaviate.Client(
url = "<
additional_headers =
"X-OpenAI-Api-Key": "YOUR-OPENAI-API-KEY"
)
Automatic vectorization
Vector databases store and query vector embeddings generated from embedding models. This means that you must vectorize your data (manually or automatically) at import and query time. Although you can use vector databases standalone (and even use your own vectors), vector databases that enable rapid prototyping handle vectorization automatically, so you can easily convert your data and queries to vectors. There is no need to write boilerplate code to do this.
This example defines a data collection called MyCollection that provides structure for the data in the vector database after initial setup. In this step you can further configure the module. vectorizer This automatically vectorizes all data objects (in this case text2vec-openai) on import and query. If you use the vector database standalone and provide your own vectors, you can omit this line of code.
class_obj =
"class": "MyCollection",
"vectorizer": "text2vec-openai",
client.schema.create_class(class_obj)
To populate the data collection MyCollection, import data objects in batches as shown below. Data objects are automatically vectorized using the defined vectorizer.
client.batch.configure(batch_size=100) *# Configure batch
# Initialize batch process*
with client.batch as batch:
batch.add_data_object(
class_name="MyCollection",
data_object= "some_text_property": "foo", "some_number_property": 1
)
As shown in the next section, the definition vectorization feature also vectorizes queries at search time.
Enable better search
The primary use of vector databases is to enable semantic similarity searches. In this example, once a vector database is configured and populated with data, data can be retrieved from the database based on similarity to a search query (“My query here”). If you defined vectorization in the previous step, the query is also vectorized to retrieve the data closest to the query in vector space.
response = (
client.query
.get("MyCollection", ["some_text"])
.with_near_text("concepts": ["My query here"])
.do()
)
However, lexical search and semantic search are not mutually exclusive concepts. Vector databases also store the original data objects along with their vector embeddings. This not only eliminates the need for a secondary database to host the original data objects, but also enables keyword-based searching (BM25). Combining keyword-based and vector searches as a hybrid search improves search results. For example, Stack Overflow implemented hybrid search with Weaviate to achieve better search results.
Integration with the technology stack
Vector databases have become an integral part of the GenAI technology stack and must be tightly integrated with other components. For example, by integrating a vector database with LLM, developers no longer have to write separate boilerplate code to retrieve information from the vector database and feed it to LLM. Instead, developers will be able to do this with just a few lines of code.
For example, Weaviate’s modular ecosystem allows you to integrate state-of-the-art generative models from providers such as OpenAI, Cohere, and Hugging Face by defining generative modules, in this case generative-openai. This allows you to expand your semantic search queries (using the .with_generate() method) to search extension generated queries. The .with_near_text() method first obtains the context associated with the property some_text. This context is then used in the prompt “Summary some_text in a tweet.”
class_obj =
"class": "MyCollection",
"vectorizer": "text2vec-openai",
"moduleConfig":
"text2vec-openai": ,
"generative-openai":
# ...
response = (
client.query
.get("MyCollection", ["some_text"])
.with_near_text("concepts": ["My query here"])
.with_generate(
single_prompt="Summarize some_text in a tweet."
)
.do()
)
Considerations for vector databases in production
Building a great prototype of a GenAI application is easy, but moving it into production comes with its own deployment and access management challenges. This section describes the concepts you need to consider when successfully moving your GenAI solution from prototype to production.
Horizontal scalability
While the amount of data in your prototype may not require the search capabilities of a full-fledged vector database, the amount of data processed in production can vary significantly. To predict the amount of data in production, vector databases can scale to billions of data objects depending on various needs, such as maximum ingestion, maximum possible dataset size, and maximum number of queries per second. Must be able to do so.
To enable very fast vector searches at scale, vector databases use vector indexes. Vector indexing distinguishes vector databases from other vector-enabled databases that support vector searches but are not optimized for them. For example, Weaviate uses a Hierarchical Navigatable Small World (HNSW) algorithm for vector indexing combined with compressed vector product quantization to reduce memory usage and be extremely fast, even when using filters. A vector search is realized. It typically performs nearest neighbor searches for millions of objects in less than 100 milliseconds.
Deployment
Vector databases must be able to accommodate different deployment requirements for different production environments. For example, Stack Overflow needed an open source, non-hosted vector database to run on their existing Azure infrastructure.
To address these requirements, different vector databases come with different deployment options.
- Managed services: Most Vector databases offer a fully managed data infrastructure. Additionally, some vector databases offer hybrid SaaS options for managing the data plane within your own cloud environment.
- Deploy your own cloud: Some vector databases are also available in various cloud marketplaces, allowing you to deploy a vector database cluster within your own cloud environment.
- Self-hosted: Many vector databases are also available as open source downloads and can be run and scaled on Kubernetes or Docker Compose.
Data protection
While choosing the right deployment infrastructure is an essential part of ensuring data protection, access management and resource isolation are equally important to meeting compliance regulations and ensuring data protection. For example, if User A uploads a document, only User A should be able to interact with the document. Weaviate uses the concept of multi-tenancy, which allows it to comply with regulatory data processing requirements (such as GDPR).
Summary
This article provides an overview of vector databases and examples of their use. This highlights the importance of vector databases in improving searches and enhancing LLM capabilities by providing access to external knowledge databases to generate factually accurate results.
This article also shows how vector databases can be used to enable rapid prototyping of GenAI applications. In addition to being easy to set up, vector databases automatically handle vectorization of data during import and query, allowing for better searches in addition to vector searches as well as keyword-based searches. It is useful for developers when it can be seamlessly integrated with other components of the technology. stack. Additionally, this article describes how vector databases can support companies as they move these prototypes into production by addressing scalability, deployment, and data protection concerns.
If you’re interested in using open source vector databases for your GenAI applications, visit Weaviate’s quickstart guide and try it out for yourself.