Skip to content

Commit

Permalink
Python Refactor (#385)
Browse files Browse the repository at this point in the history
* Refactor of diskannpy module code.

* 0.5.0.rc1 for python and enabling the build-python portion of the pr-test process.

* clang-format changes

* In theory this should speed up the python build drastically by only building the wheel for the python version and OS we're attempting to fan out to in our CICD job tree

* Missed a dollar sign

* Copy/pasting left a CICD step name that implied we were running a code formatting check when instead we were building a wheel.  This is now fixed.

* In theory, readying the release action too.  We won't know if it works until it merges and we cut a release, but at least the paths have been fixed

* Designated initializers just happened to work on linux but shouldn't have as they weren't added until cpp20

* Formatting
  • Loading branch information
daxpryce authored Jul 7, 2023
1 parent 051df41 commit 720a809
Show file tree
Hide file tree
Showing 19 changed files with 831 additions and 548 deletions.
22 changes: 22 additions & 0 deletions .github/actions/python-wheel/action.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,22 @@
name: Build Python Wheel
description: Builds a python wheel with cibuildwheel
inputs:
cibw-identifer:
description: "CI build wheel identifier to build"
required: true
runs:
using: "composite"
steps:
- uses: actions/setup-python@v3
- name: Install cibuildwheel
run: python -m pip install cibuildwheel==2.11.3
shell: bash
- name: Building Python ${{inputs.cibw-identifier}} Wheel
run: python -m cibuildwheel --output-dir dist
env:
CIBW_BUILD: ${{inputs.cibw-identifier}}
shell: bash
- uses: actions/upload-artifact@v3
with:
name: wheels
path: ./dist/*.whl
42 changes: 27 additions & 15 deletions .github/workflows/build-python.yml
Original file line number Diff line number Diff line change
@@ -1,14 +1,32 @@
name: DiskANN Build Python Wheel
on: [workflow_call]
jobs:
build:
name: Build for ${{matrix.python-version}} on ${{matrix.os}}
linux-build:
name: Python - Ubuntu - ${{matrix.cibw-identifier}}
strategy:
fail-fast: false
matrix:
os: [ubuntu-latest]
python-version: ["3.7", "3.8", "3.9", "3.10", "3.11"]
runs-on: ${{matrix.os}}
cibw-identifier: ["cp38-manylinux_x86_64", "cp39-manylinux_x86_64", "cp310-manylinux_x86_64", "cp311-manylinux_x86_64"]
runs-on: ubuntu-latest
defaults:
run:
shell: bash
steps:
- name: Checkout repository
uses: actions/checkout@v2
with:
fetch-depth: 1
- name: Building python wheel ${{matrix.cibw-identifier}}
uses: ./.github/actions/python-wheel
with:
cibw-identifier: ${{matrix.cibw-identifier}}
windows-build:
name: Python - Windows - ${{matrix.cibw-identifier}}
strategy:
fail-fast: false
matrix:
cibw-identifier: ["cp38-win_amd64", "cp39-win_amd64", "cp310-win_amd64", "cp311-win_amd64"]
runs-on: windows-latest
defaults:
run:
shell: bash
Expand All @@ -17,14 +35,8 @@ jobs:
uses: actions/checkout@v2
with:
submodules: true
- uses: actions/setup-python@v3
- name: Install cibuildwheel
run: python -m pip install cibuildwheel==2.11.3
- name: Building Wheel for Python ${{inputs.python-version}}
run: python -m cibuildwheel --output-dir dist
env:
CIBW_BUILD: ${{ inputs.python-version }}
- uses: actions/upload-artifact@v3
fetch-depth: 1
- name: Building python wheel ${{matrix.cibw-identifier}}
uses: ./.github/actions/python-wheel
with:
name: wheels
path: ./dist/*.whl
cibw-identifier: ${{matrix.cibw-identifier}}
3 changes: 3 additions & 0 deletions .github/workflows/pr-test.yml
Original file line number Diff line number Diff line change
Expand Up @@ -24,3 +24,6 @@ jobs:
dynamic:
name: Dynamic
uses: ./.github/workflows/dynamic.yml
python:
name: Python
uses: ./.github/workflows/build-python.yml
36 changes: 9 additions & 27 deletions .github/workflows/python-release.yml
Original file line number Diff line number Diff line change
Expand Up @@ -3,38 +3,20 @@ on:
release:
types: [published]
jobs:
build_wheels:
name: Build wheels on ${{ matrix.os }}
runs-on: ${{ matrix.os }}
strategy:
matrix:
os: [ubuntu-latest]
steps:
- uses: actions/checkout@v3
with:
submodules: true
- uses: actions/setup-python@v3
- name: Install cibuildwheel
run: python -m pip install cibuildwheel==2.11.3
- name: build wheels
run: python -m cibuildwheel --output-dir wheelhouse
env:
CIBW_ARCHS_LINUX: x86_64
- uses: actions/upload-artifact@v3
with:
name: wheelhouse
path: ./wheelhouse/*.whl
python-release-wheels:
name: Python
uses: ./.github/workflows/build-python.yml
release:
runs-on: ubuntu-latest
needs: build_wheels
needs: python-release-wheels
steps:
- uses: actions/download-artifact@v3
with:
name: wheelhouse
path: wheelhouse/
name: dist
path: dist/
- name: Generate SHA256 files for each wheel
run: |
sha256sum wheelhouse/*.whl > checksums.txt
sha256sum dist/*.whl > checksums.txt
cat checksums.txt
- uses: actions/setup-python@v3
- name: Install twine
Expand All @@ -44,11 +26,11 @@ jobs:
TWINE_USERNAME: ${{ secrets.PYPI_USERNAME }}
TWINE_PASSWORD: ${{ secrets.PYPI_PASSWORD }}
run: |
twine upload wheelhouse/*.whl
twine upload dist/*.whl
- name: Update release with SHA256 and Artifacts
uses: softprops/action-gh-release@v1
with:
token: ${{ secrets.GITHUB_TOKEN }}
files: |
wheelhouse/*.whl
dist/*.whl
checksums.txt
4 changes: 2 additions & 2 deletions pyproject.toml
Original file line number Diff line number Diff line change
Expand Up @@ -11,11 +11,11 @@ build-backend = "setuptools.build_meta"

[project]
name = "diskannpy"
version = "0.5.0"
version = "0.5.0.rc1"

description = "DiskANN Python extension module"
# readme = "../README.md"
requires-python = ">=3.7"
requires-python = ">=3.8"
license = {text = "MIT License"}
dependencies = [
"numpy"
Expand Down
10 changes: 9 additions & 1 deletion python/CMakeLists.txt
Original file line number Diff line number Diff line change
Expand Up @@ -26,7 +26,15 @@ execute_process(COMMAND ${Python3_EXECUTABLE} -c "import numpy; print(numpy.get_
# pybind11_add_module(diskannpy MODULE src/diskann_bindings.cpp)
# the following is fairly synonymous with pybind11_add_module, but we need more target_link_libraries
# see https://pybind11.readthedocs.io/en/latest/compiling.html#advanced-interface-library-targets for more details
add_library(_diskannpy MODULE src/diskann_bindings.cpp)
add_library(_diskannpy MODULE
src/module.cpp
src/builder.cpp
src/dynamic_memory_index.cpp
src/static_memory_index.cpp
src/static_disk_index.cpp
)

target_include_directories(_diskannpy AFTER PRIVATE include)

if (MSVC)
target_compile_options(_diskannpy PRIVATE /U_WINDLL)
Expand Down
26 changes: 26 additions & 0 deletions python/include/builder.h
Original file line number Diff line number Diff line change
@@ -0,0 +1,26 @@
// Copyright (c) Microsoft Corporation. All rights reserved.
// Licensed under the MIT license.

#pragma once

#include <cstdint>
#include <string>

#include "common.h"
#include "distance.h"

namespace diskannpy
{
template <typename DT>
void build_disk_index(diskann::Metric metric, const std::string &data_file_path, const std::string &index_prefix_path,
uint32_t complexity, uint32_t graph_degree, double final_index_ram_limit,
double indexing_ram_budget, uint32_t num_threads, uint32_t pq_disk_bytes);

template <typename DT, typename TagT = DynamicIdType, typename LabelT = filterT>
void build_memory_index(diskann::Metric metric, const std::string &vector_bin_path,
const std::string &index_output_path, uint32_t graph_degree, uint32_t complexity,
float alpha, uint32_t num_threads, bool use_pq_build,
size_t num_pq_bytes, bool use_opq, uint32_t filter_complexity,
bool use_tags = false);

}
24 changes: 24 additions & 0 deletions python/include/common.h
Original file line number Diff line number Diff line change
@@ -0,0 +1,24 @@
// Copyright (c) Microsoft Corporation. All rights reserved.
// Licensed under the MIT license.

#pragma once

#include <stdint.h>
#include <utility>

#include <pybind11/pybind11.h>
#include <pybind11/numpy.h>

namespace py = pybind11;

namespace diskannpy
{

typedef uint32_t filterT;

typedef uint32_t StaticIdType;
typedef uint32_t DynamicIdType;

template <class IdType> using NeighborsAndDistances = std::pair<py::array_t<IdType>, py::array_t<float>>;

}; // namespace diskannpy
51 changes: 51 additions & 0 deletions python/include/dynamic_memory_index.h
Original file line number Diff line number Diff line change
@@ -0,0 +1,51 @@
// Copyright (c) Microsoft Corporation. All rights reserved.
// Licensed under the MIT license.

#pragma once

#include <cstdint>
#include <string>

#include <pybind11/pybind11.h>
#include <pybind11/numpy.h>

#include "common.h"
#include "index.h"
#include "parameters.h"

namespace py = pybind11;

namespace diskannpy
{

template <typename DT>
class DynamicMemoryIndex
{
public:
DynamicMemoryIndex(diskann::Metric m, size_t dimensions, size_t max_vectors, uint32_t complexity,
uint32_t graph_degree, bool saturate_graph, uint32_t max_occlusion_size, float alpha,
uint32_t num_threads, uint32_t filter_complexity, uint32_t num_frozen_points,
uint32_t initial_search_complexity, uint32_t initial_search_threads,
bool concurrent_consolidation);

void load(const std::string &index_path);
int insert(const py::array_t<DT, py::array::c_style | py::array::forcecast> &vector, DynamicIdType id);
py::array_t<int> batch_insert(py::array_t<DT, py::array::c_style | py::array::forcecast> &vectors,
py::array_t<DynamicIdType, py::array::c_style | py::array::forcecast> &ids, int32_t num_inserts,
int num_threads = 0);
int mark_deleted(DynamicIdType id);
void save(const std::string &save_path, bool compact_before_save = false);
NeighborsAndDistances<DynamicIdType> search(py::array_t<DT, py::array::c_style | py::array::forcecast> &query, uint64_t knn,
uint64_t complexity);
NeighborsAndDistances<DynamicIdType> batch_search(py::array_t<DT, py::array::c_style | py::array::forcecast> &queries,
uint64_t num_queries, uint64_t knn, uint64_t complexity,
uint32_t num_threads);
void consolidate_delete();

private:
const uint32_t _initial_search_complexity;
const diskann::IndexWriteParameters _write_parameters;
diskann::Index<DT, DynamicIdType, filterT> _index;
};

}; // namespace diskannpy
52 changes: 52 additions & 0 deletions python/include/static_disk_index.h
Original file line number Diff line number Diff line change
@@ -0,0 +1,52 @@
// Copyright (c) Microsoft Corporation. All rights reserved.
// Licensed under the MIT license.

#pragma once

#include <cstdint>
#include <string>


#include <pybind11/pybind11.h>
#include <pybind11/numpy.h>

#ifdef _WINDOWS
#include "windows_aligned_file_reader.h"
#else
#include "linux_aligned_file_reader.h"
#endif

#include "common.h"
#include "pq_flash_index.h"

namespace py = pybind11;

namespace diskannpy {

#ifdef _WINDOWS
typedef WindowsAlignedFileReader PlatformSpecificAlignedFileReader;
#else
typedef LinuxAlignedFileReader PlatformSpecificAlignedFileReader;
#endif

template <typename DT>
class StaticDiskIndex
{
public:
StaticDiskIndex(diskann::Metric metric, const std::string &index_path_prefix, uint32_t num_threads,
size_t num_nodes_to_cache, uint32_t cache_mechanism);

void cache_bfs_levels(size_t num_nodes_to_cache);

void cache_sample_paths(size_t num_nodes_to_cache, const std::string &warmup_query_file, uint32_t num_threads);

NeighborsAndDistances<StaticIdType> search(py::array_t<DT, py::array::c_style | py::array::forcecast> &query, uint64_t knn,
uint64_t complexity, uint64_t beam_width);

NeighborsAndDistances<StaticIdType> batch_search(py::array_t<DT, py::array::c_style | py::array::forcecast> &queries, uint64_t num_queries,
uint64_t knn, uint64_t complexity, uint64_t beam_width, uint32_t num_threads);
private:
std::shared_ptr<AlignedFileReader> _reader;
diskann::PQFlashIndex<DT> _index;
};
}
34 changes: 34 additions & 0 deletions python/include/static_memory_index.h
Original file line number Diff line number Diff line change
@@ -0,0 +1,34 @@
// Copyright (c) Microsoft Corporation. All rights reserved.
// Licensed under the MIT license.

#pragma once

#include <cstdint>
#include <string>

#include <pybind11/pybind11.h>
#include <pybind11/numpy.h>

#include "common.h"
#include "index.h"

namespace py = pybind11;

namespace diskannpy {

template <typename DT>
class StaticMemoryIndex
{
public:
StaticMemoryIndex(diskann::Metric m, const std::string &index_prefix, size_t num_points,
size_t dimensions, uint32_t num_threads, uint32_t initial_search_complexity);

NeighborsAndDistances<StaticIdType> search(py::array_t<DT, py::array::c_style | py::array::forcecast> &query, uint64_t knn,
uint64_t complexity);

NeighborsAndDistances<StaticIdType> batch_search(py::array_t<DT, py::array::c_style | py::array::forcecast> &queries,
uint64_t num_queries, uint64_t knn, uint64_t complexity, uint32_t num_threads);
private:
diskann::Index<DT, StaticIdType, filterT> _index;
};
}
6 changes: 3 additions & 3 deletions python/src/_builder.py
Original file line number Diff line number Diff line change
Expand Up @@ -266,11 +266,11 @@ def build_memory_index(
num_points, dimensions = vector_file_metadata(vector_bin_path)

if vector_dtype_actual == np.single:
_builder = _native_dap.build_in_memory_float_index
_builder = _native_dap.build_memory_float_index
elif vector_dtype_actual == np.ubyte:
_builder = _native_dap.build_in_memory_uint8_index
_builder = _native_dap.build_memory_uint8_index
else:
_builder = _native_dap.build_in_memory_int8_index
_builder = _native_dap.build_memory_int8_index

index_prefix_path = os.path.join(index_directory, index_prefix)

Expand Down
Loading

0 comments on commit 720a809

Please sign in to comment.