Skip to content

Commit bcee6de

Browse files
committed
documentation improvements
1 parent 2a4b025 commit bcee6de

2 files changed

Lines changed: 128 additions & 22 deletions

File tree

README.md

Lines changed: 51 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -11,7 +11,7 @@ documentation.
1111
pip install timescale_vector
1212
```
1313

14-
## How to use
14+
## Basic Usage
1515

1616
Load up your postgres credentials. Safest way is with a .env file:
1717

@@ -27,8 +27,16 @@ connection_string = os.environ['PG_CONNECTION_STRING']
2727

2828
Next, create the client.
2929

30-
This takes three arguments: - A connection string - The name of the
31-
collection - Number of dimensions
30+
This takes three arguments:
31+
32+
- A connection string
33+
34+
- The name of the collection
35+
36+
- Number of dimensions
37+
38+
In this tutorial, we will use the async client. But we have a sync
39+
client as well (with an almost identical interface)
3240

3341
``` python
3442
vec = client.Async(connection_string, "my_data", 2)
@@ -40,9 +48,12 @@ Next, create the tables for the collection:
4048
await vec.create_tables()
4149
```
4250

43-
Next, insert some data. The data record contains: - A uuid to uniquely
44-
identify the emedding - A json blob of metadata about the embedding -
45-
The text the embedding represents - The embedding itself
51+
Next, insert some data. The data record contains:
52+
53+
- A uuid to uniquely identify the emedding
54+
- A json blob of metadata about the embedding
55+
- The text the embedding represents
56+
- The embedding itself
4657

4758
Because this data already includes uuids we only allow upserts
4859

@@ -63,21 +74,51 @@ Now you can query for similar items:
6374
await vec.search([1.0, 9.0])
6475
```
6576

66-
[<Record id=UUID('25df4eea-a17f-42c2-9426-d09b8ca40e32') metadata='{"action": "jump", "animal": "fox"}' contents='jumped over the' embedding=array([ 1. , 10.8], dtype=float32) ?column?=0.00016793422934946456>,
67-
<Record id=UUID('605e3ff5-3503-4006-826f-c84ecbb535d4') metadata='{"animal": "fox"}' contents='the brown fox' embedding=array([1. , 1.3], dtype=float32) ?column?=0.14489260377438218>]
77+
[<Record id=UUID('ae68bcbf-52e7-4977-b4b9-d2c954c3b8b4') metadata='{"action": "jump", "animal": "fox"}' contents='jumped over the' embedding=array([ 1. , 10.8], dtype=float32) ?column?=0.00016793422934946456>,
78+
<Record id=UUID('a76f2c30-f001-4e1a-abed-a2a0ce6aa8fe') metadata='{"animal": "fox"}' contents='the brown fox' embedding=array([1. , 1.3], dtype=float32) ?column?=0.14489260377438218>]
6879

6980
You can specify the number of records to return.
7081

7182
``` python
7283
await vec.search([1.0, 9.0], k=1)
7384
```
7485

75-
[<Record id=UUID('25df4eea-a17f-42c2-9426-d09b8ca40e32') metadata='{"action": "jump", "animal": "fox"}' contents='jumped over the' embedding=array([ 1. , 10.8], dtype=float32) ?column?=0.00016793422934946456>]
86+
[<Record id=UUID('ae68bcbf-52e7-4977-b4b9-d2c954c3b8b4') metadata='{"action": "jump", "animal": "fox"}' contents='jumped over the' embedding=array([ 1. , 10.8], dtype=float32) ?column?=0.00016793422934946456>]
7687

7788
You can also specify a filter on the metadata as a simple dictionary
7889

7990
``` python
8091
await vec.search([1.0, 9.0], k=1, filter={"action": "jump"})
8192
```
8293

83-
[<Record id=UUID('25df4eea-a17f-42c2-9426-d09b8ca40e32') metadata='{"action": "jump", "animal": "fox"}' contents='jumped over the' embedding=array([ 1. , 10.8], dtype=float32) ?column?=0.00016793422934946456>]
94+
[<Record id=UUID('ae68bcbf-52e7-4977-b4b9-d2c954c3b8b4') metadata='{"action": "jump", "animal": "fox"}' contents='jumped over the' embedding=array([ 1. , 10.8], dtype=float32) ?column?=0.00016793422934946456>]
95+
96+
## Advanced Usage
97+
98+
### Indexing
99+
100+
Indexing speeds up queries over your data.
101+
102+
By default, we setup indexes to query your data by the uuid and the
103+
metadata.
104+
105+
If you have many rows, you also need to setup an index on the embedding.
106+
You can create an ivfflat index with the following command after the
107+
table has been populated.
108+
109+
``` python
110+
await vec.create_ivfflat_index()
111+
```
112+
113+
Please note it is very important to do this only after you have data in
114+
the table.
115+
116+
You can drop the index with the following command.
117+
118+
``` python
119+
await vec.drop_embedding_index()
120+
```
121+
122+
Please note the community is actively working on new indexing methods
123+
for embeddings. As they become available, we will add them to our client
124+
as well.

nbs/index.ipynb

Lines changed: 77 additions & 12 deletions
Original file line numberDiff line numberDiff line change
@@ -46,7 +46,7 @@
4646
"cell_type": "markdown",
4747
"metadata": {},
4848
"source": [
49-
"## How to use"
49+
"## Basic Usage"
5050
]
5151
},
5252
{
@@ -83,9 +83,12 @@
8383
"Next, create the client. \n",
8484
"\n",
8585
"This takes three arguments: \n",
86-
"- A connection string\n",
87-
"- The name of the collection\n",
88-
"- Number of dimensions"
86+
"\n",
87+
"* A connection string\n",
88+
"* The name of the collection\n",
89+
"* Number of dimensions\n",
90+
"\n",
91+
" In this tutorial, we will use the async client. But we have a sync client as well (with an almost identical interface)"
8992
]
9093
},
9194
{
@@ -140,10 +143,11 @@
140143
"metadata": {},
141144
"source": [
142145
"Next, insert some data. The data record contains:\n",
143-
"- A uuid to uniquely identify the emedding\n",
144-
"- A json blob of metadata about the embedding\n",
145-
"- The text the embedding represents\n",
146-
"- The embedding itself\n",
146+
"\n",
147+
"* A uuid to uniquely identify the emedding\n",
148+
"* A json blob of metadata about the embedding\n",
149+
"* The text the embedding represents\n",
150+
"* The embedding itself\n",
147151
"\n",
148152
"Because this data already includes uuids we only allow upserts"
149153
]
@@ -184,8 +188,8 @@
184188
{
185189
"data": {
186190
"text/plain": [
187-
"[<Record id=UUID('25df4eea-a17f-42c2-9426-d09b8ca40e32') metadata='{\"action\": \"jump\", \"animal\": \"fox\"}' contents='jumped over the' embedding=array([ 1. , 10.8], dtype=float32) ?column?=0.00016793422934946456>,\n",
188-
" <Record id=UUID('605e3ff5-3503-4006-826f-c84ecbb535d4') metadata='{\"animal\": \"fox\"}' contents='the brown fox' embedding=array([1. , 1.3], dtype=float32) ?column?=0.14489260377438218>]"
191+
"[<Record id=UUID('ae68bcbf-52e7-4977-b4b9-d2c954c3b8b4') metadata='{\"action\": \"jump\", \"animal\": \"fox\"}' contents='jumped over the' embedding=array([ 1. , 10.8], dtype=float32) ?column?=0.00016793422934946456>,\n",
192+
" <Record id=UUID('a76f2c30-f001-4e1a-abed-a2a0ce6aa8fe') metadata='{\"animal\": \"fox\"}' contents='the brown fox' embedding=array([1. , 1.3], dtype=float32) ?column?=0.14489260377438218>]"
189193
]
190194
},
191195
"execution_count": null,
@@ -212,7 +216,7 @@
212216
{
213217
"data": {
214218
"text/plain": [
215-
"[<Record id=UUID('25df4eea-a17f-42c2-9426-d09b8ca40e32') metadata='{\"action\": \"jump\", \"animal\": \"fox\"}' contents='jumped over the' embedding=array([ 1. , 10.8], dtype=float32) ?column?=0.00016793422934946456>]"
219+
"[<Record id=UUID('ae68bcbf-52e7-4977-b4b9-d2c954c3b8b4') metadata='{\"action\": \"jump\", \"animal\": \"fox\"}' contents='jumped over the' embedding=array([ 1. , 10.8], dtype=float32) ?column?=0.00016793422934946456>]"
216220
]
217221
},
218222
"execution_count": null,
@@ -239,7 +243,7 @@
239243
{
240244
"data": {
241245
"text/plain": [
242-
"[<Record id=UUID('25df4eea-a17f-42c2-9426-d09b8ca40e32') metadata='{\"action\": \"jump\", \"animal\": \"fox\"}' contents='jumped over the' embedding=array([ 1. , 10.8], dtype=float32) ?column?=0.00016793422934946456>]"
246+
"[<Record id=UUID('ae68bcbf-52e7-4977-b4b9-d2c954c3b8b4') metadata='{\"action\": \"jump\", \"animal\": \"fox\"}' contents='jumped over the' embedding=array([ 1. , 10.8], dtype=float32) ?column?=0.00016793422934946456>]"
243247
]
244248
},
245249
"execution_count": null,
@@ -250,6 +254,67 @@
250254
"source": [
251255
"await vec.search([1.0, 9.0], k=1, filter={\"action\": \"jump\"})"
252256
]
257+
},
258+
{
259+
"cell_type": "markdown",
260+
"metadata": {},
261+
"source": [
262+
"## Advanced Usage"
263+
]
264+
},
265+
{
266+
"cell_type": "markdown",
267+
"metadata": {},
268+
"source": [
269+
"### Indexing\n",
270+
"\n",
271+
"Indexing speeds up queries over your data. \n",
272+
"\n",
273+
"By default, we setup indexes to query your data by the uuid and the metadata.\n",
274+
"\n",
275+
"If you have many rows, you also need to setup an index on the embedding. You can create an ivfflat index with the following command after the table has been populated."
276+
]
277+
},
278+
{
279+
"cell_type": "code",
280+
"execution_count": null,
281+
"metadata": {},
282+
"outputs": [],
283+
"source": [
284+
"await vec.create_ivfflat_index()"
285+
]
286+
},
287+
{
288+
"cell_type": "markdown",
289+
"metadata": {},
290+
"source": [
291+
"Please note it is very important to do this only after you have data in the table. \n",
292+
"\n",
293+
"You can drop the index with the following command."
294+
]
295+
},
296+
{
297+
"cell_type": "code",
298+
"execution_count": null,
299+
"metadata": {},
300+
"outputs": [],
301+
"source": [
302+
"await vec.drop_embedding_index()"
303+
]
304+
},
305+
{
306+
"cell_type": "markdown",
307+
"metadata": {},
308+
"source": [
309+
"Please note the community is actively working on new indexing methods for embeddings. As they become available, we will add them to our client as well."
310+
]
311+
},
312+
{
313+
"cell_type": "code",
314+
"execution_count": null,
315+
"metadata": {},
316+
"outputs": [],
317+
"source": []
253318
}
254319
],
255320
"metadata": {

0 commit comments

Comments
 (0)