You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardexpand all lines: README.md
+32-4
Original file line number
Diff line number
Diff line change
@@ -1,9 +1,13 @@
1
1
# Hnswlib - fast approximate nearest neighbor search
2
-
Header-only C++ HNSW implementation with python bindings. Paper code for the HNSW 200M SIFT experiment
2
+
Header-only C++ HNSW implementation with python bindings. Paper's code for the HNSW 200M SIFT experiment
3
3
4
4
**NEWS:**
5
5
6
-
**Thanks to Louis Abraham ([@louisabraham](https://github.com/louisabraham)) hnswlib is now can be installed via pip!**
6
+
***Thanks to Apoorv Sharma [@apoorv-sharma](https://github.com/apoorv-sharma), hnswlib now supports true element updates (the interface remained the same, but when you the perfromance/memory should not degrade as you update the element embeddinds).**
7
+
8
+
***Thanks to Dmitry [@2ooom](https://github.com/2ooom), hnswlib got a boost in performance for vector dimensions that are not mutiple of 4**
9
+
10
+
***Thanks to Louis Abraham ([@louisabraham](https://github.com/louisabraham)) hnswlib can now be installed via pip!**
7
11
8
12
Highlights:
9
13
1) Lightweight, header-only, no dependencies other than C++ 11.
@@ -23,10 +27,10 @@ Description of the algorithm parameters can be found in [ALGO_PARAMS.md](ALGO_PA
Note that inner product is not an actual metric. An element can be closer to some other element than to itself.
33
+
Note that inner product is not an actual metric. An element can be closer to some other element than to itself. That allows some speedup if you remove all elements that are not the closest to themselves from the index.
30
34
31
35
For other spaces use the nmslib library https://github.com/nmslib/nmslib.
32
36
@@ -42,6 +46,7 @@ Index methods:
42
46
*`add_items(data, data_labels, num_threads = -1)` - inserts the `data`(numpy array of vectors, shape:`N*dim`) into the structure.
43
47
*`labels` is an optional N-size numpy array of integer labels for all elements in `data`.
44
48
*`num_threads` sets the number of cpu threads to use (-1 means use default).
49
+
*`data_labels` specifies the labels for the data. If index already has the elements with the same labels, their features will be updated. Note that update procedure is slower than insertion of a new element, but more memory- and query-efficient.
45
50
* Thread-safe with other `add_items` calls, but not with `knn_query`.
46
51
47
52
*`mark_deleted(data_label)` - marks the element as deleted, so it will be ommited from search results.
@@ -223,6 +228,29 @@ To run the test on 200M SIFT subset:
223
228
224
229
The size of the bigann subset (in millions) is controlled by the variable **subset_size_milllions** hardcoded in **sift_1b.cpp**.
225
230
231
+
### Updates test
232
+
To generate testing data (from root directory):
233
+
```bash
234
+
cd examples
235
+
python update_gen_data.py
236
+
```
237
+
To compile (from root directory):
238
+
```bash
239
+
mkdir build
240
+
cd build
241
+
cmake ..
242
+
make
243
+
```
244
+
To run test **without** updates (from `build` directory)
245
+
```bash
246
+
./test_updates
247
+
```
248
+
249
+
To run test **with** updates (from `build` directory)
250
+
```bash
251
+
./test_updates update
252
+
```
253
+
226
254
### HNSW example demos
227
255
228
256
- Visual search engine for 1M amazon products (MXNet + HNSW): [website](https://thomasdelteil.github.io/VisualSearch_MXNet/), [code](https://github.com/ThomasDelteil/VisualSearch_MXNet), demo by [@ThomasDelteil](https://github.com/ThomasDelteil)
0 commit comments