Open AVM kit is a python library for real estate mass appraisal. It includes modules for data cleaning, data enrichment, modeling, and statistical evaluation of predictive models. It also includes Jupyter notebooks that model typical workflows.
Follow these steps to install and set up OpenAVMKit
on your local environment.
Start by cloning the repository to your local machine:
(This command is the same on Windows, MacOS, and Linux):
git clone https://github.com/larsiusprime/openavmkit.git
cd openavmkit
This command will clone the repository to your local machine, store it under a folder named openavmkit/
, and then navigate to that folder.
If you don't have Python on your machine, you'll need to install it.
The specific version of Python that openavmkit has been tested on is:
3.12.9
You can download that version of Python here.
If you already have Python installed, but you're not sure which version of Python you have installed, you can check by running this command:
python --version
If you have Python installed, you should see the version number printed to the console.
If you have the wrong version of Python installed, you can download the correct version from the link above, and then install it. Be very careful to make sure that the new version of Python is available in your PATH
. (If you don't know what the means, here is a handy tutorial on the subject).
It's a good practice to create a virtual environment* to isolate your Python dependencies. Here's how you can set it up using venv
, which is Python's built-in tool ("venv" for "virtual environment"):
MacOS/Linux:
python -m venv venv
source venv/bin/activate
Windows:
python -m venv venv
venv\Scripts\activate
*On a typical computer, there will be other programs that are using other versions of python and/or have their own conflicting versions of libraries that openavmkit
might also need to use. To keep openavmkit
from conflicting with your existing setup, we set up a 'virtual environment,' which is like a special bubble that is localized just to openavmkit
. In this way openavmkit
gets to use exactly the stuff it needs without messing with whatever else is already on your computer.
Let me explain a little bit what's going on here. The first command, python -m venv venv
, creates the virtual environment. You only have to run that once. The second command, the bit that ends with activate
, is what actually starts the virtual environment. You have to run that every time you open a new terminal window or tab and want to work on openavmkit
.
You can tell that you are in the virtual environment, because your command prompt will change to show the name of the virtual environment, which in this case is venv
. Here's how your command prompt will look inside and outside the virtual environment.
Outside the virtual environment:
MacOS/Linux:
/path/to/openavmkit$
Windows:
C:\path\to\openavmkit>
Inside the virtual environment:
MacOS/Linux:
(venv) /path/to/openavmkit$
Windows:
(venv) C:\path\to\openavmkit>
Take careful note that you are actually inside the virtual environment when running the following commands.
When you are done working on openavmkit
and want to leave the virtual environment, you can run this command:
deactivate
And you will return to your normal command prompt.
openavmkit
uses a bunch of third-party libraries that you need to install. Python lets us list these in a text files so you can install them with one command. Here's how you can do that, using python's built-in pip
tool, which manages your python libraries:
pip install -r requirements.txt
Then there's another dependency you have to install manually, ignore the warnings it gives you, it's fine:
pip install tabulate --upgrade
Once you've set up your python environment and dependencies, here's the basic guide to get you started:
If you want to import and use the code modules directly, you must install the library.
First, make sure you've followed the above steps.
Then, in your command line environment, make sure you are in the top level of the openavmkit/
directory. That is the same directory which contains the setup.py
file.
To install openavmkit
in editable mode (so you can make changes to the library and see them reflected in your code), run this command:
pip install -e .
To install openavmkit
in normal mode, run this command:
pip install .
The "." in that command is a special symbol that refers to the current directory. So when you run pip install .
, you are telling pip
to install the library contained in the current directory. That's why it's important to make sure you're in the right directory when you run this command!
Jupyter is a popular tool for running Python code interactively. We've included a few Jupyter notebooks in the notebooks/
directory that demonstrate how to use openavmkit
to perform common tasks.
To start using the Jupyter notebooks, you'll first need to have Jupyter installed. That should have already been taken care of in the requirements section above, but if you need to install it, you can do so with this command:
pip install jupyter
With Jupyter installed, you can start the Jupyter notebook server* by running this command:
jupyter notebook
*What's a "Jupyter notebook server?" Well, a "server" is any program that talks to other programs over a network. In this case the "network" is just your own computer, and the "other program" is your web browser. When you run jupyter notebook
, you're starting a server that talks to your web browser, and as long as it is running you can use your web browser to interact with the Jupyter notebook interface.
When you run jupyter notebook
, it will open a new tab in your web browser that shows a list of files in the current directory. You can navigate to the notebooks/
directory and open any of the notebooks to start running the code.
Here's how you can import and use the core modules directly in your own Python code.
For instance, here's a simple example that demonstrates how to calculate the Coefficient of Dispersion (COD) for a list of ratios:
import openavmkit
ratios = [0.8, 0.9, 1.0, 1.1, 1.2]
cod = openavmkit.utilities.stats.calc_cod(ratios)
print(cod)
You can also specify the specific module you want to import:
from openavmkit.utilities import stats
ratios = [0.8, 0.9, 1.0, 1.1, 1.2]
cod = stats.calc_cod(ratios)
Or even import specific functions directly:
from openavmkit.utilities.stats import calc_cod
ratios = [0.8, 0.9, 1.0, 1.1, 1.2]
cod = calc_cod(ratios)
The notebooks/
directory contains several pre-written Jupyter notebooks that demonstrate how to use the library interactively. These notebooks are especially useful for new users, as they contain step-by-step explanations and examples.
- Launch the Jupyter notebook server:
jupyter notebook
This should open a new tab in your web browser with the Jupyter interface.
- Navigate to the
notebooks/
directory in the Jupyter interface and open the notebook you want to run.
Double-click on your chosen notebook to open it.
For information on how to use Jupyter notebooks in general, refer to the official Jupyter notebook documentation.
OpenAVMKit operates on the concept of a "locality", which is a geographic area that contains a set of properties. This can represent a city, a county, a neighborhood, or any other region or jurisdiction you want to analyze. To set one up, create a folder like this within openavmkit's notebooks/
directory:
data/<locality_slug>/
Where <locality_slug>
is a unique identifying name for your locality in a particularly opinionated format. That format is:
<country_code>-<state_or_province_code>-<locality_name>
-
Country code: The 2-letter country code according to the ISO 3166-1 standard. For example, the country code for the United States is
us
, and the country code for Norway isno
. -
State/province code: The 2-letter state or province code according to the ISO 3166-2 standard. For example, the state code for Texas is
tx
, and the state code for California isca
. -
Locality name: A human-readable name for the locality itself. This follows no particular standard and is entirely up to you.
The slug itself should be all lowercase and contain no spaces or special characters other than underscores.
Some examples:
us-nc-guilford # Guilford County, North Carolina, USA
us-tx-austin # City of Austin, Texas, USA
no-03-oslo # City of Oslo, Norway
no-50-orkdal # Orkdal kommune (county), Norway
Once you have your locality set up, you will want to set it up like this (using us-nc-guilford
as an example):
data/
├──us-nc-guilford/
├── in/
├── out/
├── settings.json
The in/
directory is where you will put your raw data files.
The out/
directory is where the output files will be saved.
The settings.json
file will drive all your modeling and analysis decisions for the library. For now you can just put a blank {}
in there so that it will load, but you will want to consult the documentation / tutorials for how to construct this file.
openavmkit
includes a module for working with remote storage services. At this time the library supports three cloud storage methods:
- Microsoft Azure
- Hugging Face
- SFTP
To configure cloud storage, you will need to create a file that stores your connection credentials (such as API keys or passwords). This file should be named .env
and should be placed in the notebooks/
directory.
This file is already ignored by git, but do make sure you don't accidentally commit this file to the repository or share it with others, as it contains your sensitive login information!
This file should be a plain text file formatted like this:
SOME_VARIABLE=some_value
ANOTHER_VARIABLE=another_value
YET_ANOTHER_VARIABLE=123
That's just an example of the format; here are the actual variables that it recognizes:
Variable Name | Description |
---|---|
CLOUD_TYPE |
The type of cloud storage to use. Legal values are: azure , huggingface , sftp . You can also set this value per-project in your settings.json as cloud.type . |
AZURE_ACCESS |
The type of access your azure account has. Legal values are: read_only , read_write . |
AZURE_STORAGE_CONTAINER_NAME |
The name of the Azure storage container |
AZURE_STORAGE_CONNECTION_STRING |
The connection string for the Azure storage account |
HF_ACCESS |
The type of access your huggingface account has. Legal values are: read_only , read_write . |
HF_TOKEN |
The Hugging Face API token |
HF_REPO_ID |
The Hugging Face repository ID |
SFTP_ACCESS |
The type of access your SFTP account has. Legal values are: read_only , read_write . |
SFTP_HOST |
The hostname of the SFTP server |
SFTP_USERNAME |
The username for the SFTP server |
SFTP_PASSWORD |
The password for the SFTP server |
SFTP_PORT |
The port number for the SFTP server |
You only need to provide values for the service that you're actually using. For instance, here's what the file might look like if you are using Hugging Face:
CLOUD_TYPE=huggingface
HF_ACCESS=read_write
HF_REPO_ID=landeconomics/localities-public
HF_TOKEN=<YOUR_HUGGING_FACE_API_TOKEN>
If you're just getting started, you can just use read-only access to an existing public repository. Here's an example of how to access the public datasets provided by the The Center for Land Economics:
CLOUD_TYPE=huggingface
HF_ACCESS=read_only
HF_REPO_ID=landeconomics/localities
This will let you download the inputs for any of the Center for Land Economics' public datasets. Note that you will be unable to upload your changes and outputs to repositories that you have read-only access to.
If you want to sync with your own cloud storage, you will need to set up your own hosting account and then provide the appropriate credentials in the .env
file.
If you have multiple projects stored on different cloud services, you can set the CLOUD_TYPE
and CLOUD_ACCESS
variables in your settings.json. This will allow you to switch between cloud services on a per-project basis. Do not ever store credentials in your settings.json, however, as these are uploaded to the cloud!
openavmkit
includes a module for generating PDF reports. This module uses the pdfkit
library, which is a Python wrapper for the wkhtmltopdf
command line tool. Although pdfkit
will be installed automatically along with the rest of the dependencies, you will need to install wkhtmltopdf
manually on your system to use this feature. If you skip this step, don't worry, you'll still be able to use the rest of the library, it just won't generate PDF reports.
Visit the wkhtmltopdf download page and download the appropriate installer for your operating system.
- Download the installer linked above
- Run the installer and follow the instructions
- Add the
wkhtmltopdf
installation directory to your system's PATH environment variable
If you don't know what 3. means:
The idea is that you want the wkhtmltopdf
executable to be available from any command prompt, so you can run it from anywhere. For that to work, you need to make sure that the folder that the wkhtmltopdf
executable is in is listed in your system's PATH environment variable.
Here's how to do that:
- Find the folder where
wkhtmltopdf
was installed. It's probably inC:\Program Files\wkhtmltopdf\bin
, but it could be somewhere else. Pay attention when you install it. - Follow this tutorial to edit your PATH environment variable. You want to add the folder from step 1 to the PATH variable.
- Open a new command prompt and type
wkhtmltopdf --version
. If you see a version number, you're all set!
On Debian/Ubuntu, run:
sudo apt-get update
sudo apt-get install wkhtmltopdf
Ensure you have Homebrew installed. Then run:
brew install wkhtmltopdf
To ensure everything is working properly, you can run the test suite. This will execute all unit tests from the tests/
directory.
Run the tests using pytest
:
pytest
This will run all the unit tests and provide feedback on any errors or failed tests.