Skip to content

Commit

Permalink
Python Bindings (microsoft#156)
Browse files Browse the repository at this point in the history
* initial commit

* The python bindings were expecting a get_metric() method to exist.

* WIP: Partially driven by CMAKE, but the compiler/linker/whatever flags aren't being set right and we don't have a working extension module if we build it via cmake.  It still builds via setup.py, on Linux, though

* This commit includes a working (on linux, and possibly windows?) form of the diskannpy extension module, built via CMake.

setup.py still needs to be updated to delegate all work to CMake similar to the pybind cmake example at: https://github.com/pybind/cmake_example/blob/master/setup.py

In-depth testing has not yet occurred, but `python -c 'import diskannpy; print(diskannpy.__version__)'` completes successfully.

* This commit includes the changes necessary for our build to work with the python packaging authority's `build` packaging tool.

* The windows directory was not included in the manifest file, which meant that we were unable to use nuget to acquire our dependencies.

* Relaxed numpy requirements to the last version that worked with Python3.7

* Trying to enable Windows support

* Trying the same sort of approach that Bryan had, but it still doesn't work

* Trying to test our build with cibuildwheel.

* Fixing indentation

* We needed a trigger

* Different target_link_libraries for debug or optimized builds does not seem to help in any way, and instead only makes things worse.

* Removing quotes from runner image names

* I typed on instead of os in the matrix

* Installing system library prerequisites

* cibuildwheel builds in a container

* apt isn't available in this container's image, so falling back to apt-get

* Fix undefined symbol name errors during python setup.py install by renaming variables to correct names

* I have cibuildwheel working locally on linux

* It helps if you don't override your expectated behaviors from your pyproject.toml in your build file

* Disabling windows building in the matrix for right now

* Publishing the SHA256 checksums as a release to our github for validation it matches the wheels published to pypi (eventually)

* Giving it the name wheelhouse

* Does this actually work?

* Testing the create a release workflow

* Trying this again but commenting out the very long running cibuildwheel process.  Emulating action instead.

* The placeholder test worked, now we'll test the actual build and faux publish

* In theory, we will have a working windows build once this is incorporated.  Fingers crossed!

* Enabling a simple python build test in all pushes and a more comprehensive build test in push-test.yml

* Windows arch is AMD64 but Linux is x64_86 and I'm not sure why

* Remove function _build_disk_index_float as discussed in PR microsoft#170

* Need submodule init for windows builds

* Need to pick the right architectures

* Making changes as per Harsha's comments in PR microsoft#156

* Rather than undoing the removal, I'm actually taking the appropriate lines currently in the main branch and using them here.

* Trying to add -fPIC on Linux but only for the Python build

* Conditionally compiling diskann with -fPIC but only for python

* test-command is the shell command to run not the module we're using in python

* my fault for not setting the start directory and implying we had a module named /projects/tests

* Fix indentation in CMakeLists.txt

change tabs to spaces

Co-authored-by: Harsha Vardhan Simhadri <[email protected]>
Co-authored-by: Lakshya A Agrawal <[email protected]>
Co-authored-by: Harsha Vardhan Simhadri <[email protected]>
  • Loading branch information
4 people authored Jan 2, 2023
1 parent da11c5e commit 9948916
Show file tree
Hide file tree
Showing 15 changed files with 1,144 additions and 16 deletions.
11 changes: 10 additions & 1 deletion .github/workflows/pr-test.yml
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
name: DiskANN Build and run
name: DiskANN Pull Request Build and Test
on: [pull_request]
jobs:
build-and-run:
Expand Down Expand Up @@ -187,3 +187,12 @@ jobs:
${{ env.diskann_built_tests }}/test_streaming_scenario --data_type uint8 --dist_fn l2 --data_path rand_uint8_10D_10K_norm50.0.bin --index_path_prefix index_stream -R 64 -L 600 --alpha 1.2 --insert_threads 4 --consolidate_threads 4 --max_points_to_insert 10000 --active_window 4000 --consolidate_interval 2000 --start_point_norm 200
${{ env.diskann_built_utils }}/compute_groundtruth --data_type uint8 --dist_fn l2 --base_file index_stream.after-streaming-act4000-cons2000-max10000.data --query_file rand_uint8_10D_10K_norm50.0.bin --K 100 --gt_file gt100_base-act4000-cons2000-max10000 --tags_file index_stream.after-streaming-act4000-cons2000-max10000.tags
${{ env.diskann_built_tests }}/search_memory_index --data_type uint8 --dist_fn l2 --index_path_prefix index_stream.after-streaming-act4000-cons2000-max10000 --result_path res_stream --query_file rand_uint8_10D_10K_norm50.0.bin --gt_file gt100_base-act4000-cons2000-max10000 -K 10 -L 20 40 60 80 100 -T 64 --dynamic true --tags 1
- uses: actions/setup-python@v3
- name: Install cibuildwheel
run: python -m pip install cibuildwheel==2.11.3
- name: build wheels
run: python -m cibuildwheel --output-dir wheelhouse
env:
CIBW_ARCHS_WINDOWS: AMD64
CIBW_ARCHS_LINUX: x86_64
17 changes: 16 additions & 1 deletion .github/workflows/push-test.yml
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
name: DiskANN Build
name: DiskANN Push Build
on: [push]
jobs:
ubuntu-latest-build:
Expand All @@ -16,6 +16,13 @@ jobs:
- name: build
run: |
cd build && make -j
- uses: actions/setup-python@v3
with:
python-version: "3.10"
- name: Install Python Build Module
run: python -m pip install build
- name: Python Build
run: python -m build

windows-build:
name: Build for ${{ matrix.os }}
Expand All @@ -38,3 +45,11 @@ jobs:
run: |
mkdir build && cd build && cmake .. && msbuild diskann.sln /m /nologo /t:Build /p:Configuration="Release" /property:Platform="x64" -consoleloggerparameters:"ErrorsOnly;Summary"
shell: cmd

- uses: actions/setup-python@v3
with:
python-version: "3.10"
- name: Install Python Build Module
run: python -m pip install build
- name: Python Build
run: python -m build
47 changes: 47 additions & 0 deletions .github/workflows/python-release.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,47 @@
name: Build and Release Python Wheels
on:
release:
types: [published]
jobs:
build_wheels:
name: Build wheels on ${{ matrix.os }}
runs-on: ${{ matrix.os }}
strategy:
matrix:
os: [ubuntu-latest, windows-latest]
steps:
- uses: actions/checkout@v3
with:
submodules: true
- uses: actions/setup-python@v3
- name: Install cibuildwheel
run: python -m pip install cibuildwheel==2.11.3
- name: build wheels
run: python -m cibuildwheel --output-dir wheelhouse
env:
CIBW_ARCHS_WINDOWS: AMD64
CIBW_ARCHS_LINUX: x86_64
- uses: actions/upload-artifact@v3
with:
name: wheelhouse
path: ./wheelhouse/*.whl
release:
runs-on: ubuntu-latest
needs: build_wheels
steps:
- uses: actions/download-artifact@v3
with:
name: wheelhouse
path: wheelhouse/
- name: Generate SHA256 files for each wheel
run: |
sha256sum wheelhouse/*.whl > checksums.txt
cat checksums.txt
- name: Update release with SHA256 and Artifacts
uses: softprops/action-gh-release@v1
with:
token: ${{ secrets.GITHUB_TOKEN }}
files: |
wheelhouse/*.whl
checksums.txt
12 changes: 11 additions & 1 deletion .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -357,7 +357,17 @@ MigrationBackup/
cscope*

build/

# jetbrains specific stuff
.idea/
cmake-build-debug/

tests/python/venv
#python extension module ignores
python/diskannpy.egg-info/
python/dist/

**/*.egg-info
wheelhouse/*
dist/*
venv*/**
*.swp
34 changes: 21 additions & 13 deletions CMakeLists.txt
Original file line number Diff line number Diff line change
Expand Up @@ -135,19 +135,19 @@ if (MSVC)
"${DISKANN_MKL_LIB_PATH}/mkl_core.lib"
"${DISKANN_MKL_LIB_PATH}/mkl_intel_thread.lib")
else()
# expected path for manual intel mkl installs
set(OMP_PATH /opt/intel/oneapi/compiler/2022.0.2/linux/compiler/lib/intel64_lin/ CACHE PATH "Intel OneAPI OpenMP library implementation path")
set(MKL_ROOT /opt/intel/oneapi/mkl/latest CACHE PATH "Intel OneAPI MKL library implementation path")
link_directories(${OMP_PATH} ${MKL_ROOT}/lib/intel64)
include_directories(${MKL_ROOT}/include)

# expected path for apt packaged intel mkl installs
link_directories(/usr/lib/x86_64-linux-gnu/mkl)
include_directories(/usr/include/mkl)

# compile flags and link libraries
add_compile_options(-m64 -Wl,--no-as-needed)
link_libraries(mkl_intel_ilp64 mkl_intel_thread mkl_core iomp5 pthread m dl)
# expected path for manual intel mkl installs
set(OMP_PATH /opt/intel/oneapi/compiler/2022.0.2/linux/compiler/lib/intel64_lin/ CACHE PATH "Intel OneAPI OpenMP library implementation path")
set(MKL_ROOT /opt/intel/oneapi/mkl/latest CACHE PATH "Intel OneAPI MKL library implementation path")
link_directories(${OMP_PATH} ${MKL_ROOT}/lib/intel64)
include_directories(${MKL_ROOT}/include)

# expected path for apt packaged intel mkl installs
link_directories(/usr/lib/x86_64-linux-gnu/mkl)
include_directories(/usr/include/mkl)

# compile flags and link libraries
add_compile_options(-m64 -Wl,--no-as-needed)
link_libraries(mkl_intel_ilp64 mkl_intel_thread mkl_core iomp5 pthread m dl)
endif()

add_definitions(-DMKL_ILP64)
Expand Down Expand Up @@ -236,6 +236,9 @@ else()
set(CMAKE_CXX_FLAGS_DEBUG "${CMAKE_CXX_FLAGS_DEBUG} -g -DDEBUG -Wall -Wextra")
set(CMAKE_CXX_FLAGS_RELEASE "${CMAKE_CXX_FLAGS_RELEASE} -Ofast -DNDEBUG -march=native -mtune=native -ftree-vectorize")
add_compile_options(-march=native -Wall -fno-builtin-malloc -fno-builtin-calloc -fno-builtin-realloc -fno-builtin-free -fopenmp -fopenmp-simd -funroll-loops -Wfatal-errors -DUSE_AVX2)
if (PYBIND)
add_compile_options(-fPIC)
endif()
endif()

add_subdirectory(src)
Expand All @@ -259,3 +262,8 @@ if (RESTAPI)
endif()

include(clang-format.cmake)


if(PYBIND)
add_subdirectory(python)
endif()
13 changes: 13 additions & 0 deletions MANIFEST.in
Original file line number Diff line number Diff line change
@@ -0,0 +1,13 @@
include MANIFEST.in
include *.txt
include *.md
include setup.py
include pyproject.toml
include *.cmake
recursive-include gperftools *
recursive-include include *
recursive-include python *
recursive-include windows *
prune python/tests
recursive-include src *
recursive-include tests *
2 changes: 2 additions & 0 deletions include/pq_flash_index.h
Original file line number Diff line number Diff line change
Expand Up @@ -78,6 +78,8 @@ namespace diskann {

std::shared_ptr<AlignedFileReader> &reader;

DISKANN_DLLEXPORT diskann::Metric get_metric();

protected:
DISKANN_DLLEXPORT void use_medoids_data_as_centroids();
DISKANN_DLLEXPORT void setup_thread_data(_u64 nthreads,
Expand Down
41 changes: 41 additions & 0 deletions pyproject.toml
Original file line number Diff line number Diff line change
@@ -0,0 +1,41 @@
[build-system]
requires = [
"setuptools>=59.6",
"pybind11>=2.10.0",
"cmake>=3.22",
"numpy>=1.21",
"wheel",
]
build-backend = "setuptools.build_meta"

[project]
name = "diskannpy"
version = "0.4.0"

description = "DiskANN Python extension module"
# readme = "../README.md"
requires-python = ">=3.7"
license = {text = "MIT License"}
dependencies = [
"numpy"
]
authors = [
{name = "Harsha Vardhan Simhadri", email = "[email protected]"},
{name = "Dax Pryce", email = "[email protected]"}
]

[tool.cibuildwheel]
manylinux-x86_64-image = "manylinux_2_24"
build-frontend = "build"
skip = "pp* *musllinux*"
test-command = "python -m unittest discover -s {package}/tests"


[tool.cibuildwheel.linux]
before-all = """\
apt-get update && \
apt-get -y upgrade && \
apt-get install -y wget make cmake g++ libaio-dev libgoogle-perftools-dev libunwind-dev clang-format libboost-dev libboost-program-options-dev && \
wget https://registrationcenter-download.intel.com/akdlm/irc_nas/18487/l_BaseKit_p_2022.1.2.146.sh && \
sh l_BaseKit_p_2022.1.2.146.sh -a --components intel.oneapi.lin.mkl.devel --action install --eula accept -s --ignore-errors \
"""
62 changes: 62 additions & 0 deletions python/CMakeLists.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,62 @@
# Copyright (c) Microsoft Corporation. All rights reserved.
# Licensed under the MIT license.

cmake_minimum_required(VERSION 3.18...3.22)

set(CMAKE_CXX_STANDARD 14)

if (PYTHON_EXECUTABLE)
set(Python3_EXECUTABLE ${PYTHON_EXECUTABLE})
endif()

find_package(Python3 COMPONENTS Interpreter Development.Module NumPy REQUIRED)

execute_process(COMMAND ${Python3_EXECUTABLE} -c "import pybind11; print(pybind11.get_cmake_dir())"
OUTPUT_VARIABLE _tmp_dir
OUTPUT_STRIP_TRAILING_WHITESPACE COMMAND_ECHO STDOUT)
list(APPEND CMAKE_PREFIX_PATH "${_tmp_dir}")

# Now we can find pybind11
find_package(pybind11 CONFIG REQUIRED)

execute_process(COMMAND ${Python3_EXECUTABLE} -c "import numpy; print(numpy.get_include())"
OUTPUT_VARIABLE _numpy_include
OUTPUT_STRIP_TRAILING_WHITESPACE COMMAND_ECHO STDOUT)

# pybind11_add_module(diskannpy MODULE src/diskann_bindings.cpp)
# the following is fairly synonymous with pybind11_add_module, but we need more target_link_libraries
# see https://pybind11.readthedocs.io/en/latest/compiling.html#advanced-interface-library-targets for more details
add_library(diskannpy MODULE src/diskann_bindings.cpp)

if (MSVC)
target_compile_options(diskannpy PRIVATE /U_WINDLL)
endif()

target_link_libraries(
diskannpy
PRIVATE
pybind11::module
pybind11::lto
pybind11::windows_extras
${PROJECT_NAME}
${DISKANN_TOOLS_TCMALLOC_LINK_OPTIONS}
${DISKANN_ASYNC_LIB}
)

pybind11_extension(diskannpy)
if(NOT MSVC AND NOT ${CMAKE_BUILD_TYPE} MATCHES Debug|RelWithDebInfo)
# Strip unnecessary sections of the binary on Linux/macOS
pybind11_strip(diskannpy)
endif()

set_target_properties(diskannpy PROPERTIES CXX_VISIBILITY_PRESET "hidden"
CUDA_VISIBILITY_PRESET "hidden")

# generally, the VERSION_INFO flag is set by pyproject.toml, by way of setup.py.
# attempts to locate the version within CMake fail because the version has to be available
# to pyproject.toml for the sdist to work after we build it.

if(NOT VERSION_INFO)
set(VERSION_INFO "0.0.0dev")
endif()
target_compile_definitions(diskannpy PRIVATE VERSION_INFO="${VERSION_INFO}")
Loading

0 comments on commit 9948916

Please sign in to comment.