Skip to content
This repository has been archived by the owner on Oct 30, 2024. It is now read-only.

Commit

Permalink
Initial commit
Browse files Browse the repository at this point in the history
django

init
  • Loading branch information
jstover-das committed Jul 12, 2022
0 parents commit ab17470
Show file tree
Hide file tree
Showing 21 changed files with 506 additions and 0 deletions.
69 changes: 69 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,69 @@
# Example ETL code for DAS

This project contains some code for use as a short technical test for ETL developers.


## Overview

There are few things in this project which are intentionally done in suboptimal, inefficient, or incorrect ways.
Your task with this project is to:

- Provide feedback on the following scripts:
- `utils/vector_subset.py`
- `utils/vector_analysis.py`
- `utils/transform.py`
- Give a short summary of how the following function works and where it could be used:
- `utils/tile_vectors.py`
- Provide feedback on the database schema
- `schema.sql`



## Setup

- Install Python (tested using v3.9)
- Install the requirements listed in `requirements.txt`



## Web Server

There is a Django-based web server included in the repository. You are not expected to review these files (_i.e._ anything in the `server` folder).
The web server is included only to provide a trigger for the util scripts (`vector_subset.py` and `vector_analyis.py`) if needed.

To start the server:

```
cd server
python manage.py runserver
```
Then navigate to one of the following URLs to trigger the relevant script:

Run `vector_analysis.py` and return a JSON object:
[http://localhost:8000/analyse/](http://localhost:8000/analyse/)


Run `vector_subset.py` and return the records where "SoilOrder" equals "Gley Soils", as per URL parameters:
[http://localhost:8000/subset/SoilOrder/Gley%20Soils](http://localhost:8000/subset/SoilOrder/Gley%20Soils)



## Review Considerations
Some things to consider while reading these files:
- Does the script/database design make sense?
- Is it efficient?
- Is it clear what the code does?
- Could we improve anything?
- Any other feedback



## Notes

- We are only expecting a short review of the files specified above
- The web server (all files in the `server` folder) does __not__ need to be reviewed
- The Python scripts make use of these libraries:
- `geopandas` - This is a wrapper around `Pandas` which adds some geospatial functions
- `fiona` - This a library to iteratively read features from a geospatial vector datase


Binary file added input.gpkg
Binary file not shown.
3 changes: 3 additions & 0 deletions requirements.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
Django==4.0.6
geopandas==0.11.0
Fiona==1.8.21
20 changes: 20 additions & 0 deletions schema.sql
Original file line number Diff line number Diff line change
@@ -0,0 +1,20 @@
CREATE TABLE public.generic_geometry (
id SERIAL PRIMARY KEY,
geometry GEOMETRY,
rep_point GEOMETRY,
centroid GEOMETRY,
area FLOAT
);

CREATE TABLE public.name (
id SERIAL PRIMARY KEY,
name TEXT
);

CREATE TABLE public.rainfall_records (
id SERIAL PRIMARY KEY,
geometry_id BIGSERIAL REFERENCES public.generic_geometry NOT NULL,
name INTEGER REFERENCES public.name NOT NULL UNIQUE,
rainfall DECIMAL,
elevation JSONB
);
Empty file added server/__init__.py
Empty file.
16 changes: 16 additions & 0 deletions server/asgi.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,16 @@
"""
ASGI config for server project.
It exposes the ASGI callable as a module-level variable named ``application``.
For more information on this file, see
https://docs.djangoproject.com/en/3.2/howto/deployment/asgi/
"""

import os

from django.core.asgi import get_asgi_application

os.environ.setdefault('DJANGO_SETTINGS_MODULE', 'server.settings')

application = get_asgi_application()
22 changes: 22 additions & 0 deletions server/manage.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,22 @@
#!/usr/bin/env python
"""Django's command-line utility for administrative tasks."""
import os
import sys


def main():
"""Run administrative tasks."""
os.environ.setdefault('DJANGO_SETTINGS_MODULE', 'settings')
try:
from django.core.management import execute_from_command_line
except ImportError as exc:
raise ImportError(
"Couldn't import Django. Are you sure it's installed and "
"available on your PYTHONPATH environment variable? Did you "
"forget to activate a virtual environment?"
) from exc
execute_from_command_line(sys.argv)


if __name__ == '__main__':
main()
Empty file added server/preprocessor/__init__.py
Empty file.
3 changes: 3 additions & 0 deletions server/preprocessor/admin.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
from django.contrib import admin

# Register your models here.
6 changes: 6 additions & 0 deletions server/preprocessor/apps.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
from django.apps import AppConfig


class PreprocessorConfig(AppConfig):
default_auto_field = 'django.db.models.BigAutoField'
name = 'preprocessor'
Empty file.
3 changes: 3 additions & 0 deletions server/preprocessor/models.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
from django.db import models

# Create your models here.
3 changes: 3 additions & 0 deletions server/preprocessor/tests.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
from django.test import TestCase

# Create your tests here.
39 changes: 39 additions & 0 deletions server/preprocessor/views.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,39 @@
import json
import geopandas
from django.http import HttpRequest, JsonResponse, HttpResponse


from utils.vector_subset import get_subset
from utils.vector_analysis import do_analysis
from utils.transform import main as transform_main
from server.settings import INPUT_FILE


def index(request: HttpRequest):
return HttpResponse('''
<html>
<head><title>Function Index</title>
<body>
<h1>Example Endpoints</h1>
<p><a href="/analyse">/analyse</a></p>
<p><a href="/subset/SoilOrder/Gley Soils">/subset/SoilOrder/Gley Soils</a></p>
<p><a href="/transform">/transform</a></p>
</body>
</html>
''')


def subset(request: HttpRequest, column: str, value: str):
df = geopandas.read_file(INPUT_FILE)
subset = get_subset(df, column_name=column, match_value=value)
return JsonResponse(json.loads(subset.to_json()))


def analyse(request: HttpRequest):
return JsonResponse(do_analysis(INPUT_FILE))


def transform(request: HttpRequest):
df = transform_main(INPUT_FILE)
df['geometry'] = df.geometry.apply(lambda x: x.wkt)
return JsonResponse({'data': df.to_dict(orient='records')})
131 changes: 131 additions & 0 deletions server/settings.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,131 @@
"""
Django settings for server project.
Generated by 'django-admin startproject' using Django 3.2.1.
For more information on this file, see
https://docs.djangoproject.com/en/3.2/topics/settings/
For the full list of settings and their values, see
https://docs.djangoproject.com/en/3.2/ref/settings/
"""

import sys
from pathlib import Path

# Build paths inside the project like this: BASE_DIR / 'subdir'.
BASE_DIR = Path(__file__).resolve().parent.parent

sys.path.append(str(BASE_DIR))


# Quick-start development settings - unsuitable for production
# See https://docs.djangoproject.com/en/3.2/howto/deployment/checklist/

# SECURITY WARNING: keep the secret key used in production secret!
SECRET_KEY = 'django-insecure-2h95dd3-9+egw!=w@fyy%_t1+p)8x6xhwfipx-7bp#u45-^7_*'

# SECURITY WARNING: don't run with debug turned on in production!
DEBUG = True

ALLOWED_HOSTS = []


# Application definition

INSTALLED_APPS = [
'django.contrib.admin',
'django.contrib.auth',
'django.contrib.contenttypes',
'django.contrib.sessions',
'django.contrib.messages',
'django.contrib.staticfiles',
'preprocessor',
]

MIDDLEWARE = [
'django.middleware.security.SecurityMiddleware',
'django.contrib.sessions.middleware.SessionMiddleware',
'django.middleware.common.CommonMiddleware',
'django.middleware.csrf.CsrfViewMiddleware',
'django.contrib.auth.middleware.AuthenticationMiddleware',
'django.contrib.messages.middleware.MessageMiddleware',
'django.middleware.clickjacking.XFrameOptionsMiddleware',
]

ROOT_URLCONF = 'urls'

TEMPLATES = [
{
'BACKEND': 'django.template.backends.django.DjangoTemplates',
'DIRS': [],
'APP_DIRS': True,
'OPTIONS': {
'context_processors': [
'django.template.context_processors.debug',
'django.template.context_processors.request',
'django.contrib.auth.context_processors.auth',
'django.contrib.messages.context_processors.messages',
],
},
},
]

WSGI_APPLICATION = 'server.wsgi.application'


# Database
# https://docs.djangoproject.com/en/3.2/ref/settings/#databases

DATABASES = {
'default': {
'ENGINE': 'django.db.backends.sqlite3',
'NAME': BASE_DIR / 'db.sqlite3',
}
}


# Password validation
# https://docs.djangoproject.com/en/3.2/ref/settings/#auth-password-validators

AUTH_PASSWORD_VALIDATORS = [
{
'NAME': 'django.contrib.auth.password_validation.UserAttributeSimilarityValidator',
},
{
'NAME': 'django.contrib.auth.password_validation.MinimumLengthValidator',
},
{
'NAME': 'django.contrib.auth.password_validation.CommonPasswordValidator',
},
{
'NAME': 'django.contrib.auth.password_validation.NumericPasswordValidator',
},
]


# Internationalization
# https://docs.djangoproject.com/en/3.2/topics/i18n/

LANGUAGE_CODE = 'en-us'

TIME_ZONE = 'UTC'

USE_I18N = True

USE_L10N = True

USE_TZ = True


# Static files (CSS, JavaScript, Images)
# https://docs.djangoproject.com/en/3.2/howto/static-files/

STATIC_URL = '/static/'

# Default primary key field type
# https://docs.djangoproject.com/en/3.2/ref/settings/#default-auto-field

DEFAULT_AUTO_FIELD = 'django.db.models.BigAutoField'

INPUT_FILE = str(BASE_DIR / 'input.gpkg')
27 changes: 27 additions & 0 deletions server/urls.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,27 @@
"""server URL Configuration
The `urlpatterns` list routes URLs to views. For more information please see:
https://docs.djangoproject.com/en/3.2/topics/http/urls/
Examples:
Function views
1. Add an import: from my_app import views
2. Add a URL to urlpatterns: path('', views.home, name='home')
Class-based views
1. Add an import: from other_app.views import Home
2. Add a URL to urlpatterns: path('', Home.as_view(), name='home')
Including another URLconf
1. Import the include() function: from django.urls import include, path
2. Add a URL to urlpatterns: path('blog/', include('blog.urls'))
"""
from django.contrib import admin
from django.urls import path

from preprocessor import views

urlpatterns = [
path('', views.index),
path('admin/', admin.site.urls),
path('subset/<str:column>/<str:value>', views.subset),
path('analyse/', views.analyse),
path('transform/', views.transform),
]
16 changes: 16 additions & 0 deletions server/wsgi.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,16 @@
"""
WSGI config for server project.
It exposes the WSGI callable as a module-level variable named ``application``.
For more information on this file, see
https://docs.djangoproject.com/en/3.2/howto/deployment/wsgi/
"""

import os

from django.core.wsgi import get_wsgi_application

os.environ.setdefault('DJANGO_SETTINGS_MODULE', 'server.settings')

application = get_wsgi_application()
21 changes: 21 additions & 0 deletions utils/tile_vectors.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,21 @@
def tile_vectors(gdf: GeoDataFrame, grid_shape=(10, 10))
minx, miny, maxx, maxy = gdf.total_bounds
width = maxx - minx
height = maxy - miny
cell_width = width / grid_shape[1]
cell_height = height / grid_shape[0]
x_edges = [minx + (cell_width * x) for x in range(grid_shape[1])]
y_edges = [miny + (cell_height * x) for x in range(grid_shape[0])]

x_overlap = abs(cell_width) / 20
y_overlap = abs(cell_height) / 20

tiles = [
box(x, y, x + cell_width + x_overlap, y + cell_height + y_overlap)
for x in x_edges for y in y_edges
]

for idx, tile in enumerate(tiles):
subset = gdf.iloc[list(gdf.sindex.intersection(tile.bounds))].copy()
subset.geometry = subset.geometry.apply(lambda g: g.intersection(tile))
yield subset, tile
Loading

0 comments on commit ab17470

Please sign in to comment.