LangChain向量存储

向量存储

LangChain 中的 Vector Stores（向量存储） 是用于存储文本嵌入（embeddings）并支持语义相似性搜索的核心组件。它在构建检索增强生成（RAG, Retrieval-Augmented Generation）系统中扮演关键角色。

主要作用

存储嵌入向量：将文档通过嵌入模型转换为高维向量后存入。
执行相似性搜索：根据查询文本的嵌入，找出与之语义最接近的文档。

参考资料

统一接口

LangChain 为所有向量数据库提供了统一的抽象接口，使得切换底层实现无需修改业务逻辑。核心方法包括：

方法	功能
`add_documents(documents, ids)`	向向量库中添加带元数据的文档
`delete(ids)`	根据 ID 删除文档
`similarity_search(query, k=4, filter=None)`	执行语义相似性搜索，可指定返回数量 `k` 和元数据过滤条件

说明：以上方法有相应的异步版本

向量数据库技术选型

主流向量数据库对比

工具	模式	易用性	性能	成本	适用场景
Milvus	开源 / 托管	中（需部署）	高	可控	大规模、自定义部署
Chroma	开源 / 轻量	高（本地 / 托管）	中	低	原型与小规模应用
Pinecone	托管 SaaS	高（免运维）	高（低延迟）	中高	快速上线、生产级 RAG

Milvus使用

Docker安装Milvus单机版：《Milvus安装与使用》

安装依赖

shell

pip install -qU langchain-milvus -i https://mirrors.aliyun.com/pypi/simple --trusted-host=mirrors.aliyun.com

创建Milvus向量数据库

python

from pymilvus import Collection, MilvusException, connections, db, utility

conn = connections.connect(host="127.0.0.1", port=19530)
db_name = "milvus_demo"
try:
    existing_databases = db.list_database()
    if db_name in existing_databases:
        print(f"Database '{db_name}' already exists.")

        # Use the database context
        db.using_database(db_name)

        # Drop all collections in the database
        collections = utility.list_collections()
        for collection_name in collections:
            collection = Collection(name=collection_name)
            collection.drop()
            print(f"Collection '{collection_name}' has been dropped.")

        db.drop_database(db_name)
        print(f"Database '{db_name}' has been deleted.")
    else:
        print(f"Database '{db_name}' does not exist.")
        database = db.create_database(db_name)
        print(f"Database '{db_name}' created successfully.")
except MilvusException as e:
    print(f"An error occurred: {e}")

初始化向量数据库实例

python

from langchain_milvus import Milvus

URI = "http://127.0.0.1:19530"
vector_store = Milvus(
    embedding_function=embeddings,
    connection_args={"uri": URI, "token": "root:Milvus", "db_name": "milvus_demo"},
    index_params={"index_type": "FLAT", "metric_type": "L2"},
    consistency_level="Strong",
    drop_old=False,  # set to True if seeking to drop the collection with that name if it exists
)

向量数据库写入文档

python

from uuid import uuid4
from langchain_core.documents import Document

document_1 = Document(
    page_content="I had chocolate chip pancakes and scrambled eggs for breakfast this morning.",
    metadata={"source": "tweet"},
)
document_2 = Document(
    page_content="The weather forecast for tomorrow is cloudy and overcast, with a high of 62 degrees.",
    metadata={"source": "news"},
)
document_3 = Document(
    page_content="Building an exciting new project with LangChain - come check it out!",
    metadata={"source": "tweet"},
)
documents = [document_1, document_2, document_3]
uuids = [str(uuid4()) for _ in range(len(documents))]
# 向向量库中添加带元数据的文档
vector_store.add_documents(documents=documents, ids=uuids)

文档检索示例

输入用户查询语句，返回与查询语义相似的 Document 列表。

python

### 直接相似性搜索
query = "LangChain provides abstractions to make working with LLMs easy"
results = vector_store.similarity_search(
    query=query,
    k=2,
    expr='source == "tweet"',
)

for res in results:
    print(f"* {res.page_content} [{res.metadata}]")
    
    
### 带分数的搜索（用于调试相关性）
results = vector_store.similarity_search_with_score("What is RAG?", k=1)
for doc, score in results:
    print(f"[Score: {score:.2f}] {doc.page_content}")

LangChain向量存储 ​

向量存储 ​

向量存储 ​

参考资料 ​

统一接口 ​

向量数据库技术选型 ​

Milvus使用 ​

文档检索示例 ​

LangChain向量存储

向量存储

向量存储

参考资料

统一接口

向量数据库技术选型

Milvus使用

文档检索示例