Skip to content

Commit 0468c09

Browse files
author
Gal Ben David
committed
Initial Commit
0 parents  commit 0468c09

34 files changed

+11039
-0
lines changed

.github/workflows/pythonpackage.yml

+41
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,41 @@
1+
name: Build
2+
on: [push, pull_request]
3+
jobs:
4+
build:
5+
runs-on: ubuntu-latest
6+
strategy:
7+
max-parallel: 4
8+
matrix:
9+
python-version: [3.6, 3.7, 3.8, pypy3]
10+
steps:
11+
- uses: actions/checkout@v1
12+
- name: Set up Python ${{ matrix.python-version }}
13+
uses: actions/setup-python@v1
14+
with:
15+
python-version: ${{ matrix.python-version }}
16+
- name: Install Ubuntu packages
17+
run: >-
18+
sudo apt install g++-9 libre2-dev libgit2-dev;
19+
sudo update-alternatives --install /usr/bin/g++ g++ /usr/bin/g++-9 9;
20+
python -m pip install --user --upgrade setuptools pybind11;
21+
- name: Test module
22+
run: >-
23+
python setup.py test
24+
deploy:
25+
needs: build
26+
runs-on: ubuntu-latest
27+
if: github.event_name == 'push' && startsWith(github.event.ref, 'refs/tags')
28+
steps:
29+
- uses: actions/checkout@v1
30+
- name: Set up Python 3.8
31+
uses: actions/setup-python@v1
32+
with:
33+
python-version: 3.8
34+
- name: Build a source tarball
35+
run: >-
36+
python -m pip install --user --upgrade setuptools pybind11;
37+
python setup.py sdist;
38+
- name: Publish distribution 📦 to PyPI
39+
uses: pypa/gh-action-pypi-publish@master
40+
with:
41+
password: ${{ secrets.pypi_password }}

.gitignore

+133
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,133 @@
1+
# Byte-compiled / optimized / DLL files
2+
__pycache__/
3+
*.py[cod]
4+
*$py.class
5+
6+
# C extensions
7+
*.so
8+
9+
# Distribution / packaging
10+
.Python
11+
build/
12+
develop-eggs/
13+
dist/
14+
downloads/
15+
eggs/
16+
.eggs/
17+
lib/
18+
lib64/
19+
parts/
20+
sdist/
21+
var/
22+
wheels/
23+
pip-wheel-metadata/
24+
share/python-wheels/
25+
*.egg-info/
26+
.installed.cfg
27+
*.egg
28+
MANIFEST
29+
30+
# PyInstaller
31+
# Usually these files are written by a python script from a template
32+
# before PyInstaller builds the exe, so as to inject date/other infos into it.
33+
*.manifest
34+
*.spec
35+
36+
# Installer logs
37+
pip-log.txt
38+
pip-delete-this-directory.txt
39+
40+
# Unit test / coverage reports
41+
htmlcov/
42+
.tox/
43+
.nox/
44+
.coverage
45+
.coverage.*
46+
.cache
47+
nosetests.xml
48+
coverage.xml
49+
*.cover
50+
*.py,cover
51+
.hypothesis/
52+
.pytest_cache/
53+
54+
# Translations
55+
*.mo
56+
*.pot
57+
58+
# Django stuff:
59+
*.log
60+
local_settings.py
61+
db.sqlite3
62+
db.sqlite3-journal
63+
64+
# Flask stuff:
65+
instance/
66+
.webassets-cache
67+
68+
# Scrapy stuff:
69+
.scrapy
70+
71+
# Sphinx documentation
72+
docs/_build/
73+
74+
# PyBuilder
75+
target/
76+
77+
# Jupyter Notebook
78+
.ipynb_checkpoints
79+
80+
# IPython
81+
profile_default/
82+
ipython_config.py
83+
84+
# pyenv
85+
.python-version
86+
87+
# pipenv
88+
# According to pypa/pipenv#598, it is recommended to include Pipfile.lock in version control.
89+
# However, in case of collaboration, if having platform-specific dependencies or dependencies
90+
# having no cross-platform support, pipenv may install dependencies that don't work, or not
91+
# install all needed dependencies.
92+
#Pipfile.lock
93+
94+
# PEP 582; used by e.g. github.com/David-OConnor/pyflow
95+
__pypackages__/
96+
97+
# Celery stuff
98+
celerybeat-schedule
99+
celerybeat.pid
100+
101+
# SageMath parsed files
102+
*.sage.py
103+
104+
# Environments
105+
.env
106+
.venv
107+
env/
108+
venv/
109+
ENV/
110+
env.bak/
111+
venv.bak/
112+
113+
# Spyder project settings
114+
.spyderproject
115+
.spyproject
116+
117+
# Rope project settings
118+
.ropeproject
119+
120+
# mkdocs documentation
121+
/site
122+
123+
# mypy
124+
.mypy_cache/
125+
.dmypy.json
126+
dmypy.json
127+
128+
# Pyre type checker
129+
.pyre/
130+
131+
*.cppimporthash
132+
.rendered.*
133+
.vscode

LICENSE

+21
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,21 @@
1+
MIT License
2+
3+
Copyright (c) 2020 Gal Ben David
4+
5+
Permission is hereby granted, free of charge, to any person obtaining a copy
6+
of this software and associated documentation files (the "Software"), to deal
7+
in the Software without restriction, including without limitation the rights
8+
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
9+
copies of the Software, and to permit persons to whom the Software is
10+
furnished to do so, subject to the following conditions:
11+
12+
The above copyright notice and this permission notice shall be included in all
13+
copies or substantial portions of the Software.
14+
15+
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
16+
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
17+
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
18+
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
19+
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
20+
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
21+
SOFTWARE.

MANIFEST.in

+4
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,4 @@
1+
include README.md
2+
include images/logo.png
3+
graft tests
4+
recursive-include src *

README.md

+119
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,119 @@
1+
<p align="center">
2+
<a href="https://github.com/intsights/PyRepScan">
3+
<img src="https://raw.githubusercontent.com/intsights/PyRepScan/master/images/logo.png" alt="Logo">
4+
</a>
5+
<h3 align="center">
6+
A Git Repository Leaks Scanner Python library written in C++
7+
</h3>
8+
</p>
9+
10+
![license](https://img.shields.io/badge/MIT-License-blue)
11+
![Python](https://img.shields.io/badge/Python-3.6%20%7C%203.7%20%7C%203.8%20%7C%20pypy3-blue)
12+
![Build](https://github.com/intsights/PyRepScan/workflows/Build/badge.svg)
13+
[![PyPi](https://img.shields.io/pypi/v/PyRepScan.svg)](https://pypi.org/project/PyRepScan/)
14+
15+
## Table of Contents
16+
17+
- [Table of Contents](#table-of-contents)
18+
- [About The Project](#about-the-project)
19+
- [Built With](#built-with)
20+
- [Performance](#performance)
21+
- [CPU](#cpu)
22+
- [Prerequisites](#prerequisites)
23+
- [Installation](#installation)
24+
- [Usage](#usage)
25+
- [License](#license)
26+
- [Contact](#contact)
27+
28+
29+
## About The Project
30+
31+
PyRepScan is a python library written in C++. The library uses [libgit2](https://github.com/libgit2/libgit2) for repository parsing and traversing, [re2](https://github.com/google/re2) for regex pattern matching and [cpp-taskflow](https://github.com/cpp-taskflow/cpp-taskflow) for concurrency. The library was written to achieve high performance and python bindings.
32+
33+
34+
### Built With
35+
36+
* [libgit2](https://github.com/libgit2/libgit2)
37+
* [re2](https://github.com/google/re2)
38+
* [cpp-taskflow](https://github.com/cpp-taskflow/cpp-taskflow)
39+
40+
41+
### Performance
42+
43+
#### CPU
44+
| Library | Time | Improvement Factor |
45+
| ------------- | ------------- | ------------- |
46+
| [PyRepScan](https://github.com/intsights/PyRepScan) | 2.18s | 1.0x |
47+
| [gitleaks](https://github.com/zricethezav/gitleaks) | 63.0s | 28.9x |
48+
49+
50+
### Prerequisites
51+
52+
In order to compile this package you should have GCC & Python development package installed.
53+
* Fedora
54+
```sh
55+
sudo dnf install python3-devel gcc-c++ libgit2-devel re2-devel
56+
```
57+
* Ubuntu 18.04
58+
```sh
59+
sudo apt install python3-dev g++-9 libgit2-dev libre2-dev
60+
```
61+
62+
### Installation
63+
64+
```sh
65+
pip3 install PyRepScan
66+
```
67+
68+
69+
## Usage
70+
71+
```python
72+
import pyrepscan
73+
74+
grs = pyrepscan.GitRepositoryScanner()
75+
76+
# Adds a specific rule, can be called multiple times or none
77+
grs.add_rule(
78+
name='First Rule',
79+
regex_pattern=r'''(-----BEGIN PRIVATE KEY-----)''',
80+
regex_blacklist_patterns=[],
81+
)
82+
# Compiles the rules. Should be called only once after all the rules were added
83+
grs.compile_rules()
84+
85+
# Add file extensions to ignore during the search
86+
grs.add_ignored_file_extension('bin')
87+
grs.add_ignored_file_extension('jpg')
88+
89+
# Add file paths to ignore during the search. Free text is allowed
90+
grs.add_ignored_file_path('site-packages')
91+
grs.add_ignored_file_path('node_modules')
92+
93+
# Scans a repository
94+
results = grs.scan('/repository/path')
95+
96+
# Results is a list of dicts. Each dict is in the following format:
97+
# {
98+
# 'author_email': '[email protected]',
99+
# 'author_name': 'Author Name',
100+
# 'commit_id': '1111111111111111111111111111111111111111',
101+
# 'commit_message': 'The commit message',
102+
# 'content': 'The content of the file that has been matched',
103+
# 'file_path': 'full/file/path',
104+
# 'match': 'The matched group',
105+
# 'rule_name': 'First Rule'
106+
# },
107+
```
108+
109+
110+
## License
111+
112+
Distributed under the MIT License. See `LICENSE` for more information.
113+
114+
115+
## Contact
116+
117+
Gal Ben David - [email protected]
118+
119+
Project Link: [https://github.com/intsights/PyRepScan](https://github.com/intsights/PyRepScan)

images/logo.png

57.7 KB
Loading

setup.py

+63
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,63 @@
1+
import setuptools
2+
import pybind11
3+
import os
4+
import glob
5+
6+
7+
setuptools.setup(
8+
name='PyRepScan',
9+
version='0.1.1',
10+
author='Gal Ben David',
11+
author_email='[email protected]',
12+
url='https://github.com/intsights/PyRepScan',
13+
project_urls={
14+
'Source': 'https://github.com/intsights/PyRepScan',
15+
},
16+
license='MIT',
17+
description='A Git Repository Leaks Scanner Python library written in C++',
18+
long_description=open('README.md').read(),
19+
long_description_content_type='text/markdown',
20+
classifiers=[
21+
'License :: OSI Approved :: MIT License',
22+
'Programming Language :: Python :: 3.6',
23+
'Programming Language :: Python :: 3.7',
24+
'Programming Language :: Python :: 3.8',
25+
],
26+
keywords='git repository leaks scanner detector libgit2 re2 c++',
27+
python_requires='>=3.6',
28+
zip_safe=False,
29+
install_requires=[
30+
'pybind11',
31+
],
32+
tests_require=[
33+
'gitpython',
34+
],
35+
package_data={},
36+
include_package_data=True,
37+
ext_modules=[
38+
setuptools.Extension(
39+
name='pyrepscan',
40+
sources=glob.glob(
41+
pathname=os.path.join(
42+
'src',
43+
'git_repository_scanner.cpp',
44+
),
45+
),
46+
language='c++',
47+
extra_compile_args=[
48+
'-Ofast',
49+
'-std=c++17',
50+
],
51+
extra_link_args=[
52+
'-lre2',
53+
'-lgit2',
54+
'-lpthread',
55+
],
56+
include_dirs=[
57+
'src',
58+
pybind11.get_include(False),
59+
pybind11.get_include(True),
60+
],
61+
),
62+
],
63+
)

0 commit comments

Comments
 (0)