Milvus
Milvus is a database that stores, indexes, and manages massive embedding vectors generated by deep neural networks and other machine learning (ML) models.
This notebook shows how to use functionality related to the Milvus vector database.
Setup
You'll need to install langchain-milvus
with pip install -qU langchain-milvus
to use this integration.
pip install -qU langchain_milvus
Note: you may need to restart the kernel to use updated packages.
Credentials
No credentials are needed to use the Milvus
vector store.
Initialization
pip install -qU langchain-openai
import getpass
import os
if not os.environ.get("OPENAI_API_KEY"):
os.environ["OPENAI_API_KEY"] = getpass.getpass("Enter API key for OpenAI: ")
from langchain_openai import OpenAIEmbeddings
embeddings = OpenAIEmbeddings(model="text-embedding-3-large")
Milvus Lite
The easiest way to prototype is to use Milvus Lite, where everything is stored in a local vector database file. Only the Flat index can be used.
from langchain_milvus import Milvus
URI = "./milvus_example.db"
vector_store = Milvus(
embedding_function=embeddings,
connection_args={"uri": URI},
index_params={"index_type": "FLAT", "metric_type": "L2"},
)
Milvus Server
If you have a large amount of data (e.g., more than a million vectors), we recommend setting up a more performant Milvus server on Docker or Kubernetes.
The Milvus server offers support for a variety of indexes. Leveraging these different indexes can significantly enhance the retrieval capabilities and expedite the retrieval process, tailored to your specific requirements.
As an illustration, consider the case of Milvus Standalone. To initiate the Docker container, you can run the following command:
!curl -sfL https://raw.githubusercontent.com/milvus-io/milvus/master/scripts/standalone_embed.sh -o standalone_embed.sh
!bash standalone_embed.sh start
Password:
Here we create a Milvus database:
from pymilvus import Collection, MilvusException, connections, db, utility
conn = connections.connect(host="127.0.0.1", port=19530)
# Check if the database exists
db_name = "milvus_demo"
try:
existing_databases = db.list_database()
if db_name in existing_databases:
print(f"Database '{db_name}' already exists.")
# Use the database context
db.using_database(db_name)
# Drop all collections in the database
collections = utility.list_collections()
for collection_name in collections:
collection = Collection(name=collection_name)
collection.drop()
print(f"Collection '{collection_name}' has been dropped.")
db.drop_database(db_name)
print(f"Database '{db_name}' has been deleted.")
else:
print(f"Database '{db_name}' does not exist.")
database = db.create_database(db_name)
print(f"Database '{db_name}' created successfully.")
except MilvusException as e:
print(f"An error occurred: {e}")
Database 'milvus_demo' does not exist.
Database 'milvus_demo' created successfully.
Note the change in the URI below. Once the instance is initialized, navigate to http://127.0.0.1:9091/webui to view the local web UI.
Here is an example of how you create your vector store instance with the Milvus database serivce:
from langchain_milvus import BM25BuiltInFunction, Milvus
URI = "http://localhost:19530"
vectorstore = Milvus(
embedding_function=embeddings,
connection_args={"uri": URI, "token": "root:Milvus", "db_name": "milvus_demo"},
index_params={"index_type": "FLAT", "metric_type": "L2"},
consistency_level="Strong",
drop_old=False, # set to True if seeking to drop the collection with that name if it exists
)
If you want to use Zilliz Cloud, the fully managed cloud service for Milvus, please adjust the uri and token, which correspond to the Public Endpoint and Api key in Zilliz Cloud.
Compartmentalize the data with Milvus Collections
You can store unrelated documents in different collections within the same Milvus instance.
Here's how you can create a new collection:
from langchain_core.documents import Document
vector_store_saved = Milvus.from_documents(
[Document(page_content="foo!")],
embeddings,
collection_name="langchain_example",
connection_args={"uri": URI},
)
And here is how you retrieve that stored collection:
vector_store_loaded = Milvus(
embeddings,
connection_args={"uri": URI},
collection_name="langchain_example",
)
Manage vector store
Once you have created your vector store, we can interact with it by adding and deleting different items.
Add items to vector store
We can add items to our vector store by using the add_documents
function.
from uuid import uuid4
from langchain_core.documents import Document
document_1 = Document(
page_content="I had chocolate chip pancakes and scrambled eggs for breakfast this morning.",
metadata={"source": "tweet"},
)
document_2 = Document(
page_content="The weather forecast for tomorrow is cloudy and overcast, with a high of 62 degrees.",
metadata={"source": "news"},
)
document_3 = Document(
page_content="Building an exciting new project with LangChain - come check it out!",
metadata={"source": "tweet"},
)
document_4 = Document(
page_content="Robbers broke into the city bank and stole $1 million in cash.",
metadata={"source": "news"},
)
document_5 = Document(
page_content="Wow! That was an amazing movie. I can't wait to see it again.",
metadata={"source": "tweet"},
)
document_6 = Document(
page_content="Is the new iPhone worth the price? Read this review to find out.",
metadata={"source": "website"},
)
document_7 = Document(
page_content="The top 10 soccer players in the world right now.",
metadata={"source": "website"},
)
document_8 = Document(
page_content="LangGraph is the best framework for building stateful, agentic applications!",
metadata={"source": "tweet"},
)
document_9 = Document(
page_content="The stock market is down 500 points today due to fears of a recession.",
metadata={"source": "news"},
)
document_10 = Document(
page_content="I have a bad feeling I am going to get deleted :(",
metadata={"source": "tweet"},
)
documents = [
document_1,
document_2,
document_3,
document_4,
document_5,
document_6,
document_7,
document_8,
document_9,
document_10,
]
uuids = [str(uuid4()) for _ in range(len(documents))]
vector_store.add_documents(documents=documents, ids=uuids)
Delete items from vector store
vector_store.delete(ids=[uuids[-1]])
(insert count: 0, delete count: 1, upsert count: 0, timestamp: 0, success count: 0, err count: 0, cost: 0)