|
46 | 46 | "cell_type": "markdown", |
47 | 47 | "metadata": {}, |
48 | 48 | "source": [ |
49 | | - "## How to use" |
| 49 | + "## Basic Usage" |
50 | 50 | ] |
51 | 51 | }, |
52 | 52 | { |
|
83 | 83 | "Next, create the client. \n", |
84 | 84 | "\n", |
85 | 85 | "This takes three arguments: \n", |
86 | | - "- A connection string\n", |
87 | | - "- The name of the collection\n", |
88 | | - "- Number of dimensions" |
| 86 | + "\n", |
| 87 | + "* A connection string\n", |
| 88 | + "* The name of the collection\n", |
| 89 | + "* Number of dimensions\n", |
| 90 | + "\n", |
| 91 | + " In this tutorial, we will use the async client. But we have a sync client as well (with an almost identical interface)" |
89 | 92 | ] |
90 | 93 | }, |
91 | 94 | { |
|
140 | 143 | "metadata": {}, |
141 | 144 | "source": [ |
142 | 145 | "Next, insert some data. The data record contains:\n", |
143 | | - "- A uuid to uniquely identify the emedding\n", |
144 | | - "- A json blob of metadata about the embedding\n", |
145 | | - "- The text the embedding represents\n", |
146 | | - "- The embedding itself\n", |
| 146 | + "\n", |
| 147 | + "* A uuid to uniquely identify the emedding\n", |
| 148 | + "* A json blob of metadata about the embedding\n", |
| 149 | + "* The text the embedding represents\n", |
| 150 | + "* The embedding itself\n", |
147 | 151 | "\n", |
148 | 152 | "Because this data already includes uuids we only allow upserts" |
149 | 153 | ] |
|
184 | 188 | { |
185 | 189 | "data": { |
186 | 190 | "text/plain": [ |
187 | | - "[<Record id=UUID('25df4eea-a17f-42c2-9426-d09b8ca40e32') metadata='{\"action\": \"jump\", \"animal\": \"fox\"}' contents='jumped over the' embedding=array([ 1. , 10.8], dtype=float32) ?column?=0.00016793422934946456>,\n", |
188 | | - " <Record id=UUID('605e3ff5-3503-4006-826f-c84ecbb535d4') metadata='{\"animal\": \"fox\"}' contents='the brown fox' embedding=array([1. , 1.3], dtype=float32) ?column?=0.14489260377438218>]" |
| 191 | + "[<Record id=UUID('ae68bcbf-52e7-4977-b4b9-d2c954c3b8b4') metadata='{\"action\": \"jump\", \"animal\": \"fox\"}' contents='jumped over the' embedding=array([ 1. , 10.8], dtype=float32) ?column?=0.00016793422934946456>,\n", |
| 192 | + " <Record id=UUID('a76f2c30-f001-4e1a-abed-a2a0ce6aa8fe') metadata='{\"animal\": \"fox\"}' contents='the brown fox' embedding=array([1. , 1.3], dtype=float32) ?column?=0.14489260377438218>]" |
189 | 193 | ] |
190 | 194 | }, |
191 | 195 | "execution_count": null, |
|
212 | 216 | { |
213 | 217 | "data": { |
214 | 218 | "text/plain": [ |
215 | | - "[<Record id=UUID('25df4eea-a17f-42c2-9426-d09b8ca40e32') metadata='{\"action\": \"jump\", \"animal\": \"fox\"}' contents='jumped over the' embedding=array([ 1. , 10.8], dtype=float32) ?column?=0.00016793422934946456>]" |
| 219 | + "[<Record id=UUID('ae68bcbf-52e7-4977-b4b9-d2c954c3b8b4') metadata='{\"action\": \"jump\", \"animal\": \"fox\"}' contents='jumped over the' embedding=array([ 1. , 10.8], dtype=float32) ?column?=0.00016793422934946456>]" |
216 | 220 | ] |
217 | 221 | }, |
218 | 222 | "execution_count": null, |
|
239 | 243 | { |
240 | 244 | "data": { |
241 | 245 | "text/plain": [ |
242 | | - "[<Record id=UUID('25df4eea-a17f-42c2-9426-d09b8ca40e32') metadata='{\"action\": \"jump\", \"animal\": \"fox\"}' contents='jumped over the' embedding=array([ 1. , 10.8], dtype=float32) ?column?=0.00016793422934946456>]" |
| 246 | + "[<Record id=UUID('ae68bcbf-52e7-4977-b4b9-d2c954c3b8b4') metadata='{\"action\": \"jump\", \"animal\": \"fox\"}' contents='jumped over the' embedding=array([ 1. , 10.8], dtype=float32) ?column?=0.00016793422934946456>]" |
243 | 247 | ] |
244 | 248 | }, |
245 | 249 | "execution_count": null, |
|
250 | 254 | "source": [ |
251 | 255 | "await vec.search([1.0, 9.0], k=1, filter={\"action\": \"jump\"})" |
252 | 256 | ] |
| 257 | + }, |
| 258 | + { |
| 259 | + "cell_type": "markdown", |
| 260 | + "metadata": {}, |
| 261 | + "source": [ |
| 262 | + "## Advanced Usage" |
| 263 | + ] |
| 264 | + }, |
| 265 | + { |
| 266 | + "cell_type": "markdown", |
| 267 | + "metadata": {}, |
| 268 | + "source": [ |
| 269 | + "### Indexing\n", |
| 270 | + "\n", |
| 271 | + "Indexing speeds up queries over your data. \n", |
| 272 | + "\n", |
| 273 | + "By default, we setup indexes to query your data by the uuid and the metadata.\n", |
| 274 | + "\n", |
| 275 | + "If you have many rows, you also need to setup an index on the embedding. You can create an ivfflat index with the following command after the table has been populated." |
| 276 | + ] |
| 277 | + }, |
| 278 | + { |
| 279 | + "cell_type": "code", |
| 280 | + "execution_count": null, |
| 281 | + "metadata": {}, |
| 282 | + "outputs": [], |
| 283 | + "source": [ |
| 284 | + "await vec.create_ivfflat_index()" |
| 285 | + ] |
| 286 | + }, |
| 287 | + { |
| 288 | + "cell_type": "markdown", |
| 289 | + "metadata": {}, |
| 290 | + "source": [ |
| 291 | + "Please note it is very important to do this only after you have data in the table. \n", |
| 292 | + "\n", |
| 293 | + "You can drop the index with the following command." |
| 294 | + ] |
| 295 | + }, |
| 296 | + { |
| 297 | + "cell_type": "code", |
| 298 | + "execution_count": null, |
| 299 | + "metadata": {}, |
| 300 | + "outputs": [], |
| 301 | + "source": [ |
| 302 | + "await vec.drop_embedding_index()" |
| 303 | + ] |
| 304 | + }, |
| 305 | + { |
| 306 | + "cell_type": "markdown", |
| 307 | + "metadata": {}, |
| 308 | + "source": [ |
| 309 | + "Please note the community is actively working on new indexing methods for embeddings. As they become available, we will add them to our client as well." |
| 310 | + ] |
| 311 | + }, |
| 312 | + { |
| 313 | + "cell_type": "code", |
| 314 | + "execution_count": null, |
| 315 | + "metadata": {}, |
| 316 | + "outputs": [], |
| 317 | + "source": [] |
253 | 318 | } |
254 | 319 | ], |
255 | 320 | "metadata": { |
|
0 commit comments