Skip to content

Commit 73a8f81

Browse files
authored
Update 03-heavyweight.md
1 parent ab96e78 commit 73a8f81

File tree

1 file changed

+29
-64
lines changed

1 file changed

+29
-64
lines changed

search/03-heavyweight.md

+29-64
Original file line numberDiff line numberDiff line change
@@ -12,91 +12,56 @@
1212

1313
## Heavyweight
1414

15-
Do you need more control of fuzzy matches than Postgres search vectors provide? Do calculated values slow your queries to a crawl? Is your search corpus just plain huge? While a more involved solution than issuing SQL or handling search client side, an implementation of powerful search engine [Solr](https://lucene.apache.org/solr/guide/7_1/index.html) may be your best bet.
15+
Do you need more control of fuzzy matches than Postgres search vectors provide? Do calculated values slow your queries to a crawl? Is your search corpus just plain huge? While a more involved solution than issuing SQL or handling search client side, an implementation of powerful search engine [ElasticSearch](https://www.elastic.co/) may be your best bet.
1616

17-
If you’re implementing search for a Django application, [Haystack](https://django-haystack.readthedocs.io/en/master/tutorial.html) is an extension that connects your application to a custom search engine, e.g., Solr. While it’s not without pitfalls, Haystack provides a familiar API for defining your search fields, issuing queries, and retrieving results and so can smooth your transition to a more complex search setup.
17+
If you’re implementing search for a Django application, [Haystack](https://django-haystack.readthedocs.io/en/master/tutorial.html) is an extension that connects your application to a custom search engine, e.g., ElasticSearch. While it’s not without pitfalls, Haystack provides a familiar API for defining your search fields, issuing queries, and retrieving results and so can smooth your transition to a more complex search setup.
1818

19-
Beware: Configuring and administering Solr is of intermediate to advanced difficulty. When you need a fancy search, however, it can be well worth the effort (and we have some experienced hands on deck who are happy to help).
19+
You should under no circumstances write custom adapter code to interact with ElasticSearch. You will end up reimplementing much of HayStack badly.
20+
21+
Beware: Configuring and administering ElasticSearch is of intermediate to advanced difficulty. When you need a fancy search, however, it can be well worth the effort (and we have some experienced hands on deck who are happy to help).
2022

2123
### Pros
2224

23-
* Solr is infinitely configurable.
24-
* You control [how your data and queries are broken apart and compared](https://lucene.apache.org/solr/guide/7_1/understanding-analyzers-tokenizers-and-filters.html) via analyzers, tokenizers, and filters.
25-
* Handy dandy web GUI for testing [query analysis](https://lucene.apache.org/solr/guide/7_1/analysis-screen.html#analysis-screen), [results](https://lucene.apache.org/solr/guide/7_1/query-screen.html), and more.
26-
* [Faceting](https://lucene.apache.org/solr/guide/7_1/faceting.html), [highlighting](https://lucene.apache.org/solr/guide/7_1/highlighting.html), [auto-suggest](https://lucene.apache.org/solr/guide/7_1/suggester.html), [geosearch](https://lucene.apache.org/solr/guide/7_1/spatial-search.html): Solr does it all.
27-
* Excellent [documentation](https://lucene.apache.org/solr/guide/7_1/index.html).
25+
* HayStack is infinitely configurable.
26+
* You control [how your data and queries are broken apart and compared](https://www.elastic.co/guide/en/elasticsearch/reference/current/analysis-tokenizers.html) via analyzers, tokenizers, and filters.
27+
* [Faceting](https://www.elastic.co/guide/en/app-search/current/facets-guide.html), [auto-suggest](https://www.elastic.co/guide/en/elasticsearch/reference/current/search-suggesters.html), [geosearch](https://www.elastic.co/guide/en/elasticsearch/reference/current/geo-queries.html): ElasticSearch does it all.
28+
* Excellent [documentation](https://www.elastic.co/guide/index.html).
2829

2930
### Cons
3031

31-
* Solr is infinitely configurable.
32-
* Solr provides a powerful engine; you have to handle the results yourself.
33-
* Solr process must be managed separately from your application, though this is made less of a problem with containerization (e.g., Docker).
34-
* There are some non-obvious pitfalls and equally non-obvious solutions, e.g., [“deep paging” (read: going more than a few pages into search results) is inefficient](https://lucene.apache.org/solr/guide/7_1/pagination-of-results.html#performance-problems-with-deep-paging) but can be mitigated with [cursor marks](https://lucene.apache.org/solr/guide/7_1/pagination-of-results.html#fetching-a-large-number-of-sorted-results-cursors).
35-
* Haystack can help or hinder, depending on your use case. As with any library, you trade the convenience of using someone else’s work for the idiosyncrasies of their implementation. For example, it inexplicably gobbles memory while building the Solr index. We’re [working on a solution](https://github.com/datamade/django-councilmatic/pull/219) for this particular quirk, but there are [other head scratchers](https://django-haystack.readthedocs.io/en/master/searchqueryset_api.html?highlight=%22hl.fl%22#SearchQuerySet.highlight) to overcome.
32+
* ElasticSearch is infinitely configurable.
33+
* ElasticSearch process must be managed separately from your application, though this is made less of a problem with containerization (e.g., Docker).
3634

3735
### Getting started
3836

39-
#### Run Solr
40-
41-
1. Copy solr_configs directory to your project. Here’s [a pretty basic one](https://github.com/datamade/bga-payroll/tree/master/solr_configs).
42-
43-
2. Create a `docker-compose.yml` file at the root of your project directory.
44-
37+
#### Run ElasticSearch
4538

46-
```yaml
47-
version: '2.4'
39+
1. Create a `docker-compose.yml` file at the root of your project directory.
4840

49-
services:
50-
solr:
51-
image: solr:latest
52-
container_name: <APP_NAME>-solr
53-
volumes:
54-
- ./solr_configs:/<APP_NAME>_configs
55-
- solr-data:/opt/solr/server/solr/mycores
56-
command: sh -c 'solr-create -c <SOLR_CORE_NAME> -d /<APP_NAME>_configs'
57-
ports:
58-
- 8986:8983
59-
environment:
60-
SOLR_LOG_LEVEL: DEBUG
61-
restart: on-failure
6241

42+
```yaml
43+
elasticsearch:
44+
image: elasticsearch:7.14.2
45+
container_name: chi-councilmatic-elasticsearch
46+
ports:
47+
- 9200:9200
48+
environment:
49+
- discovery.type=single-node
50+
- logger.org.elasticsearch.discovery=DEBUG
51+
- "ES_JAVA_OPTS=-Xms512m -Xmx512m"
52+
mem_limit: 1g
6353
volumes:
64-
solr-data:
65-
```
66-
67-
3. Start Solr. In your terminal, run `docker-compose up -d solr`.
68-
- To view the logs, run `docker logs -f solr`.
69-
- To stop Solr, or if you make changes to your underlying Solr or container configuration, run `docker-compose down`. This will remove your Solr container, but keep your index data in tact, thanks to use of the named `solr-data` volume. To spin up a new Solr container, start over with `docker-compose up -d solr`.
70-
71-
#### Configure Haystack
72-
73-
The [Haystack docs](https://django-haystack.readthedocs.io/en/master/tutorial.html#getting-started-with-haystack) give a clear, step-by-step guide for Solr-Django integration. The basic steps entail the following.
54+
- yourapp-es-data:/usr/share/elasticsearch/data
7455

75-
1. Tell your Django app where Haystack can find Solr (i.e., define a [HAYSTACK_CONNECTIONS](https://django-haystack.readthedocs.io/en/master/tutorial.html#solr) variable in your settings).
76-
2. Create a [Haystack SearchIndex](https://django-haystack.readthedocs.io/en/master/tutorial.html#handling-data). Define the models to be indexed in a *.py file, as with [django-councilmatic](https://github.com/datamade/django-councilmatic/blob/master/councilmatic_core/haystack_indexes.py).
77-
3. Run build_solr_schema & update_index [management commands](https://django-haystack.readthedocs.io/en/master/management_commands.html).
78-
- Building an index can consume substantive memory, and Haystack does not have great memory management. Call these commands with a batch size argument to avoid memory errors, e.g., `python manage.py rebuild_index --batch-size=100`.
79-
4. Use a Haystack [search view](https://django-haystack.readthedocs.io/en/master/views_and_forms.html#views) and [search form](https://django-haystack.readthedocs.io/en/master/views_and_forms.html#forms) to query your new index.
56+
volumes:
57+
yourapp-es-data:
58+
```
8059
8160
## Examples
8261
83-
**django-councilmatic (Django)**
62+
**chi-councilmatic (Django)**
8463
8564
Query large bodies of text. Return faceted & highlighted results.
8665
8766
* [Haystack index](https://github.com/datamade/django-councilmatic/blob/e61e5215e2dc24937643dcb9f68a8266b00275e2/councilmatic_core/haystack_indexes.py) and [custom text-rendering template](https://github.com/datamade/django-councilmatic/blob/e61e5215e2dc24937643dcb9f68a8266b00275e2/councilmatic_core/templates/search/indexes/councilmatic_core/bill_text.txt)
8867
* [Search form](https://github.com/datamade/django-councilmatic/blob/e61e5215e2dc24937643dcb9f68a8266b00275e2/councilmatic_core/views.py#L95) and [search view](https://github.com/datamade/django-councilmatic/blob/e61e5215e2dc24937643dcb9f68a8266b00275e2/councilmatic_core/views.py#L39)
89-
90-
**nyc-council-councilmatic (Django, extends django-councilmatic)**
91-
92-
Ditto django-councilmatic, but makes it _custom_.
93-
94-
* [Haystack index](https://github.com/datamade/nyc-council-councilmatic/blob/94974de317e34dcb05165a7c23717960c400d942/nyc/search_indexes.py) and [custom text-rendering template](https://github.com/datamade/nyc-council-councilmatic/blob/94974de317e34dcb05165a7c23717960c400d942/nyc/templates/search/indexes/nyc/bill_text.txt)
95-
* [Search view](https://github.com/datamade/nyc-council-councilmatic/blob/94974de317e34dcb05165a7c23717960c400d942/nyc/views.py#L213)
96-
97-
**la-metro-councilmatic (Django, extends django-councilmatic)**
98-
99-
Ditto django-councilmatic, but makes it _custom_. Notable for its implementation of a custom Haystack Highlighter class, which allows for exact multi-word matching.
100-
101-
* [Custom Highlighter](https://github.com/datamade/django-councilmatic/blob/master/councilmatic_core/utils.py) in django-councilmatic
102-
* [Simple tag](https://github.com/datamade/la-metro-councilmatic/blob/84d0e9c5c954dcc262bce33fd98a4ac58c2f9501/lametro/templatetags/lametro_extras.py#L196) and [the rendering of search results](https://github.com/datamade/la-metro-councilmatic/blob/84d0e9c5c954dcc262bce33fd98a4ac58c2f9501/lametro/templates/partials/search_result.html#L22)

0 commit comments

Comments
 (0)