Pinecone多租户实现指南

Pinecone

类型：数据库

简介：实时且性能出色的向量数据库，专门针对大规模向量搜索进行优化。

多租户就像是一个大楼里有多个租户，每个租户的空间是独立的，互不干扰。Pinecone作为向量数据库，也支持这种模式，让你可以在一个系统里为多个客户提供服务，同时保证他们的数据安全隔离。下面站长百科就来详细介绍如何在Pinecone中实现多租户。

一、Pinecone多租户基本原理

在Pinecone里，数据的组织最高层是”索引”，在创建时需要定义好要放的数据维度和查找方式。

在索引里面，又有一个个”命名空间”。每个租户的数据就存放在自己的小格子里，所有的数据操作，比如添加、查询，都只针对特定的小格子，这样就实现了租户间的数据隔离。

Pinecone多租户优势：

1、租户隔离：在无服务器架构中，每个命名空间都是单独存储的，因此使用命名空间可以在租户/客户之间实现数据的物理隔离。

2、没有吵闹的邻居：读取和写入始终以单个命名空间为目标，因此一个租户/客户的行为不会影响其他租户/客户。

3、无需维护工作：无服务器索引根据使用情况自动扩展;您无需配置或管理任何计算或存储资源。

4、成本效益：使用无服务器索引，您只需为存储的数据量和执行的作量付费。特别是对于查询，成本部分取决于必须扫描的记录总数，因此使用命名空间可以显著降低查询成本。

5、简单的租户删除：要卸载租户/客户，只需删除相关的命名空间。

二、Pinecone多租户实现步骤

1、Pinecone创建无服务器索引

无服务器索引就像一个智能书架，会根据你的使用情况自动扩展，而且你只需要为实际使用的空间和操作付费。创建时，需要指定要部署的云服务提供商和区域。

from pinecone.grpc import PineconeGRPC as Pinecone
from pinecone import ServerlessSpec

pc = Pinecone(api_key=”YOUR_API_KEY”)

pc.create_index(
name=”multitenant-app”,
dimension=8,
metric=”cosine”,
spec=ServerlessSpec(
cloud=”aws”,
region=”us-east-1″
)
)

2、Pinecone隔离租户数据

为每个租户创建独立的命名空间，就像为每个租户分配一个专属的小格子。当你第一次向某个租户的命名空间添加数据时，这个命名空间就会自动创建。

from pinecone.grpc import PineconeGRPC as Pinecone

连接Pinecone服务并获取索引

pc = Pinecone(api_key=”你的API密钥”)
index = pc.Index(“多租户应用”)

向租户1的命名空间添加数据

index.upsert(
vectors=[
{“id”: “A”, “values”: [0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1]},
{“id”: “B”, “values”: [0.2, 0.2, 0.2, 0.2, 0.2, 0.2, 0.2, 0.2]},
{“id”: “C”, “values”: [0.3, 0.3, 0.3, 0.3, 0.3, 0.3, 0.3, 0.3]},
{“id”: “D”, “values”: [0.4, 0.4, 0.4, 0.4, 0.4, 0.4, 0.4, 0.4]}
],
namespace=”租户1″ 指定租户1的命名空间
)

向租户2的命名空间添加数据

index.upsert(
vectors=[
{“id”: “E”, “values”: [0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5]},
{“id”: “F”, “values”: [0.6, 0.6, 0.6, 0.6, 0.6, 0.6, 0.6, 0.6]},
{“id”: “G”, “values”: [0.7, 0.7, 0.7, 0.7, 0.7, 0.7, 0.7, 0.7]},
{“id”: “H”, “values”: [0.8, 0.8, 0.8, 0.8, 0.8, 0.8, 0.8, 0.8]}
],
namespace=”租户2″ 指定租户2的命名空间
)

如果需要更新某个租户的数据，同样要指定对应的命名空间：

更新租户1命名空间中的数据

index.update(id=”A”, values=[0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8], namespace=”租户1″)

3、查询租户数据

在查询时，也需要指定要查询的命名空间，这样就可以确保一个租户的查询不会影响其他租户。

from pinecone.grpc import PineconeGRPC as Pinecone

连接服务并获取索引

pc = Pinecone(api_key=”你的API密钥”)
index = pc.Index(“多租户应用”)

查询租户2命名空间中与示例向量最相似的3个向量

query_results = index.query(
namespace=”租户2″, 指定租户2的命名空间
vector=[0.7, 0.7, 0.7, 0.7, 0.7, 0.7, 0.7, 0.7], 查询向量
top_k=3, 返回最相似的3个结果
include_values=True 包含向量值
)

print(query_results)

返回结果示例:

{‘matches’: [{‘id’: ‘F’,
‘score’: 1.00000012,
‘values’: [0.6, 0.6, 0.6, 0.6, 0.6, 0.6, 0.6, 0.6]},
{‘id’: ‘G’,
‘score’: 1.0,
‘values’: [0.7, 0.7, 0.7, 0.7, 0.7, 0.7, 0.7, 0.7]},
{‘id’: ‘E’,
‘score’: 1.0,
‘values’: [0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5]}],
‘namespace’: ‘租户2’,
‘usage’: {‘read_units’: 6}}

4、Pinecone租户下线处理

当某个Pinecone租户不再使用服务时，你可以轻松地删除他们的所有数据，只需要指定命名空间即可。