Skip to content

Latest commit

 

History

History
289 lines (187 loc) · 19.4 KB

application-configuration.adoc

File metadata and controls

289 lines (187 loc) · 19.4 KB

Overview about Spring Boot Configuration

Spring Boot supports a wide range of configuration options including:

Configurations also support Placeholders (e.g. app.name=Smarti and later app.description=${app.name} is a Spring Boot application).

As alternative to Java properties file Spring Boot allows to use YAML

Configuration Properties

Mongo DB configuration

Mongo DB is used by Smarti to store client, configuration and conversations.

  • spring.data.mongodb.uri = mongodb://localhost/smarti

  • spring.data.mongodb.database = smarti

  • spring.data.mongodb.host = localhost

  • spring.data.mongodb.port = 27017

  • spring.data.mongodb.password =

  • spring.data.mongodb.username =

  • smarti.migration.mongo.script-home = /usr/share/smarti/scripts location of the database-migration scripts. When using on of the provided start-scripts, the database migration will be carried out on startup.

Tomcat AJP-Config

To enable AJP use the following properties

  • tomcat.ajp.enabled=false

  • tomcat.ajp.protocol=AJP/1.3

  • tomcat.ajp.port=9090

  • tomcat.ajp.allow-trace=false

  • tomcat.ajp.secure=false

  • tomcat.ajp.scheme=http

Spring Mail Configuration

No core component of Smarti does send mails. However in cases where modules want to send notifications per mail those are the properties used by Spring to configure the Mail Server to use.

  • spring.mail.host=

  • spring.mail.port=

  • spring.mail.protocol=smtp

  • spring.mail.username=

  • spring.mail.password=

Security Configuration

  • security.config.enabled = true - turn security (auth) off/on globally, e.g. for testing

  • security.config.implementation = mongo - authentication backend, currently only mongo is supported

  • security.config.logout-redirect = - redirect-target after successful logout.

  • security.config.mongo.password-hasher = SHA-256 - password encryption algorithm

  • security.config.mongo.enable-signup = false - enable/disable public account-creation

  • security.config.mongo.enable-password-recovery = false - enable/disable password recovery workflow

  • security.config.mongo.admin-password = - set the default admin password: on startup, if there is no user with role ADMIN, a new user is created using this password. If left blank, a random password will be generated and printed to the log.

SolrLib Configuration

SolrLib is a library that allows to use Solr in Spring. It supports embedded, standalone and cloud versions of Solr by adding the according modules.

  • solrlib.home = /absolute/path/to/solr/home: The ${SOLR_HOME} directory, Used by embedded and standalone. NOTE: In case an embedded Solr Server is used it is highly recommended to set solrlib.home to the absolute path where the embedded Solr server will be located. Otherwise a new Solr core will be created in a temp folder on every startup.

  • solrlib.base-url = http://localhost:8983/solr: base-url for all solr requests. This will trigger using solrlib-standalone if available on the classpath -

  • solrlib.zk-connection = zookeeper1:8121,zookeeper2:8121: ZooKeeper connection string. This will trigger using solrlib-cloud if available on the classpath

  • solrlib.collection-prefix = prefix for the remote collection names, to avoid name-clashes on shared servers. Only used by standalone and cloud

  • solrlib.max-shards-per-node = 1: Only relevant in cloud-mode

  • solrlib.deploy-cores = true: option to disable automatic configuration update/deployment to remote servers. You might not have the karma to do so. Only used by standalone and cloud

  • solrlib.delete-on-shutdown = false: option to delete the solrlib-home upon shutdown. Only used by embedded

Other properties

UI Cache

  • ui.cache.maxAge = 864000: The maximum age of elements in the cache in seconds (default: 864000 - 10 days). If < 0 no cache will be used.

  • ui.cache.indexMaxAge = 3600: The maximum age of a cached index.hmtl (default: 3600 - 1 hour). If < 0 no cache will be used.

CORS

  • cors.enabled = true: allows to enable/disable Cors

Property Injection

Allows to inject backend properties in the Frontend

  • resource.transform.inject.text = constants*.js

Default Wbservice Error Handler

  • webservice.errorhandler.writetrace = false: Note that even if disabled stacktraces for 5** responses will be logged.

Jsonp callback

  • jsonp.callback = callback: The name of the callback

Rocket.chat Endpoint

  • rocketchat.proxy.hostname =

  • rocketchat.proxy.port = 80

  • rocketchat.proxy.scheme = http

Speak Service

The Speak Service managed resource bundles for bot generated replay messages in conversations.

  • message.locale = de_DE

  • message.source =

Conversation Indexing

Conversation are indexed in Solr managed by SolrLib

  • smarti.index.rebuildOnStartup = true: Allows to enable/disable a full rebuild of indexes on startup

  • smarti.index.conversation.commitWithin = 10000: Defines the maximum time span until after published conversations are available in the index. Values are in M´milliseconds. For values < 0 the default 10 seconds will be used. For values >= 0 < 1000 the minimum value of 1000ms will be used.

  • smarti.index.conversation.message.merge-timeout = 30: Multiple messages of the same users are merged to a single message if they where sent within the configured time period. Values are in Seconds. The default is 30 seconds.

Processing Configuration

Configures the amount of system resources used by Smarti to do processing (mainly analysis)

  • smarti.processing.numThreads =: The number of analysis threads can be configured by the default value is 2. For every thread one should preserve ~500MByte additional Java Heap space. For the best usage of CPU power the number of threads should be the same as the number of cores.

Analysis Configuration

This section describes the configuration of the analysis workflow and the Analysis Components

Warning
For now Analysis configurations are global. No Client specific configuration is possible.
Note
Client specific analysis configurations are planed for a future release. They will use the client configuration web service.
Basic Configuration
  • smarti.analysis.language: The language of conversations. If present language detection will be deactivated and the configured language is assumed for all conversations. NOTE: This configuration can be overruled by a client specific configuration set by the configuration service

  • smarti.analysis.conextSize: The size of the context (in number of messages) used for analysis. If a conversation includes more as conextSize messeages only the last conextSize will be used for analysis. If a value ⇐ 0 is configured all messages will be analysed.

Note
A higer conextSize results in longer processing times and slightly higher memory requirements. A small conextSize may result in lower quality of analysis results. Currently the minimum size of the context is 3.
Analysis Pipeline

The analysis pipeline used to process conversations can be configured by the following properties

  • `smarti.analysis.pipeline.required = ` comma separated list of required analysis component (empty if none are required). If required components are missing the Analysis Service will not start

  • smarti.analysis.pipeline.optional = *,!keyword.interestingterms.conversation: comma separated list of optional analysis component.

    • comma separated list of names to explicitly define the processors to be used

    • * to include all. If activated !{name} can be used to exclude specific analysis components.

In additiona the smarti.analysis.language allows to set the lanugage of conversation. If set this language will be used for conversations. If not present or empty the language will be detected based on the content.

Stanford NLP

Stanford NLP 3.8.0 can be used for NLP processing of German language texts. With no configuration in place it will use the default configuration as provided by the German model files of the Stanford distribution.

The following configuration properties allow to change the configuration:

  • nlp.stanfordnlp.de.posModel= (default: edu/stanford/nlp/models/pos-tagger/german/german-hgc.tagger)

  • nlp.stanfordnlp.de.nerModel= (default: edu/stanford/nlp/models/ner/german.conll.hgc_175m_600.crf.ser.gz): Allows to include custom built NER models. Multiple models are separated by ,.

  • nlp.stanfordnlp.de.parseModel= (default: edu/stanford/nlp/models/lexparser/germanPCFG.ser.gz): The default is a good tradeof on quality, memory and speed. Consider edu/stanford/nlp/models/srparser/germanSR.ser.gz for lower memory footprint and higher speed.

  • nlp.stanfordnlp.de.parseMaxlen= (default: 30): Memory consumption increases with the square of this number. 30 is ok for 4g heap.

For the parseModel Stanford includes three models

  • PCFG Parser (edu/stanford/nlp/models/lexparser/germanPCFG.ser.gz): This is the default parser configured for Smarti. It runs fine with 4g of heap and parseMaxlen ⇐ 30 (parseMaxlen ⇐ 40 is borderline ok)

  • Factored Parser (edu/stanford/nlp/models/lexparser/germanFactored.ser.gz): This is the default of Stanford NLP. It is the slowest and needs a lot of memory (especially with parseMaxlen > 30). Do NOT use this parser with setups providing <8g java heap (-Xmx8g)

  • Shift Reduce Parser (edu/stanford/nlp/models/srparser/germanSR.ser.gz): This is the fastest parser option available and recommended in situations where problems with the PCFG Parser occure. For more information see its documentation on Stanford NLP.

The maximal parse length (nlp.stanfordnlp.de.parseMaxlen) defines the maximum number of tokens a sentence can have to be processed by the parser. As the memory requirement of the parser is roughly the square of the sentence length. More information are available in the parser FAQ.

Interesting Term Configuration

The Interesting Terms component performs a keyword extraction. It uses tf-idf of words in a document corpus to mark the most relevant words of a conversation. Implementation wise Solr is used to manage the text corpus and Solr MLT requests are used to retrieve relevant terms.

Their are several ways to configure Solr endpoints to be used for interesting terms.

  • keyword.solrmlt[0].name = my-corpus: name suffix for the analysis component name. MUST BE unique.

  • keyword.solrmlt[0].url = http://localhost:8983/solr/my-corpus: The URL of the Solr endpoint

  • keyword.solrmlt[0].field = text_gen: The default field used in cases the lanugage is unknown or as fallback if no field is configured for the language of the conversation

  • keyword.solrmlt[0].lang.{lang} = {field}: The field to be used for {lang} (e.g. for German: keyword.solrmlt[0].lang.de = text_de)

The above configuration requires a Solr Server. To allow the use of embedded Solr Servers specific modules are required. Currenty two of those exist.

  • solrcore.wikipedia.de.resource = Absolute path to the archive with the German Wikipedia Corpus.

  • solrcore.crawl.systel.resource = Absolute path to the archive with the crawl of Systel related Webpages

NOTE: The archives with the Solr cores are separate downloads. The cores are initialized on the embedded Solr server managed by SolrLib

Token Filter: Stopword

The Token Filter allows to remove Keywords from a Conversation that are present in a configured stop-word list.

Module: token-processor

  • processor.token.stopword.default = {spring-resource} : List of stop words used for any language (in addition to language specific stopwords)

  • processor.token.stopword.{lang} = {spring-resource}: list of stop words for the language lang.

where:

  • lang is the 2 letter ISO language code (e.g. de for German)

  • spring-resource are loaded as Spring Resource. Therefore classpath:, file: and URL resources (http(s):, ftp:) can be used.

  • Stopword lists a text files with a single word per line. Empty lines and lines starting with # are ignored.

Hasso Extraction

Hasso was a spefic use case of the predecessor of Smarti. The module hasso-vocabulary-extractor provides two vocabulary based keyword extraction components.

  • smarti.extractor.synonyms.db = CSV file with ; as column separator and utf-8 as encoding. One vocabulary entry per row. The value in the first column is the preferred label. Additional columns for synonyms. The content is expected to be in German language. Extracted Entities will have the type term and the tag db-entity.

  • smarti.extractor.synonyms.sap = CSV file with , as column separator and utf-8 as encoding. One vocabulary entry per row. The value in the first column is the preferred label. Additional columns for synonyms. The content is expected to be in German language. Extracted Entities will have the type term and the tag sap-entity

Query Builder Default Configuration

Query Builder are configured per Client via the Client Configuration service. However a system wide default configuration can be used to initialize configurations for new clients.

This section includes configuration properties used to define the default configuration of query builders.

Solr Endpoint configuration

A SolrEndpoint is used by the generic Solr QueryProfider provided by the query-solr module.

Note
The configuration properties described in this section do NOT configure an actual Solr endpoint. They are only used as defaults for actual Solr Search Query Builder configurations.

Prefix: query.solr

General Properties

  • query.solr.enabled = false (type: Boolean): The default state for new Solr Endpoint Configurations

  • query.solr.solr-endpoint = http(s)://solr-host:port/solr/core-name (type: String): The URL of the Solr Endpoint (Solr Core)

Search Configuration

Properties with the prefix query.solr.search define how the Solr query is build from Tokens extracted from the conversation

The default will search for location names and general token names in the default search field of Solr. All other options are deactivated. By setting the following properties those defaults for new Solr Endpoint configurations can be changed.

  • Title Search

    • query.solr.search.title.enabled = false (type: Boolean, default: false): Title search is disabled by default

    • query.solr.search.title.field = title (type: String): The name of the full text field are null or empty to use the default search field

  • Full-Text Search

    • query.solr.search.full-text.enabled = true (type: Boolean, default: true): Full text search is enabled by default

    • query.solr.search.full-text.field = (type: String): The name of the full text field are null or empty to use the default field

  • Related Document Search

    • query.solr.search.related.enabled = false (type: Boolean, default: true): If related Document search enabled

    • query.solr.search.related.fields = (type: List<String>): The fields to use for search for similar documents

  • Spatial (Geo) Search

    • query.solr.search.spatial.enabled = true (type: Boolean, default: true)

    • query.solr.search.spatial.locationNameField = (type: String): The name of the field used to search for location names or null or empty to use the default field

    • query.solr.search.spatial.latLonPointSpatialField (type: String): The name of the Solr field using a latLonPointSpatial type to search for documents near a extracted location (with included lat/lon information)

    • query.solr.search.spatial.rptField = (type: String): The name of the Solr field using a rpt type to search for documents near a extracted location (with included lat/lon information)

    • query.solr.search.spatial.bboxField = (type: String): The name of the Solr field using a bbox type to search for documents near a extracted location (with included lat/lon information)

  • Temporal Search

    • query.solr.search.temporal.enabled = false (type: Boolean, default: false):

    • query.solr.search.temporal.timeRangeField = (type: Boolean, default: false): The name of the Solr field using the DateRangeField type used to search Documents near the extracted Date/Times or Date/Time ranges.

    • query.solr.search.temporal.startTimeField = (type: Boolean, default: false): The name of the Solr date field used to search for Documents near extracted Date/Times or the start time of extracted ranges.

    • query.solr.search.temporal.endTimeField = (type: Boolean, default: false): The name of the Solr date field used to search for Documents near end date of extracted ranges.

Result Configuration

Properties with the prefix query.solr.result are used to define how results are processed. Most important the mappings define how to map fields in Solr documents to fields used in the UI.

Setting defaults for mappings is usefull if different cores do share the same or similar schema.xml

  • query.solr.result.mappings.title = (type: String): The title of the document

  • query.solr.result.mappings.description = (type: String): The description to be shown for results

  • query.solr.result.mappings.type = (type: String): the type of the document

  • query.solr.result.mappings.doctype = (type: String): The document type of the document

  • query.solr.result.mappings.thumb = (type: String): The thumbnail for the document

  • query.solr.result.mappings.link = (type: String): The link pointing to the resource described by the document.

  • query.solr.result.mappings.date = (type: String): The date of the document

  • query.solr.result.mappings.source = (type: String): The source of the document

Solr Defaults Configuration

The prefix query.solr.defaults properties can be used to set Solr Params that are included in all Solr queries (e.g. to set the default field one can define query.solr.defaults.df = my-default-field).

Typical examples include

  • query.solr.defautls.rows = 10: This sets the number of results to 10

  • query.solr.defautls.df = text: the default search field to text

NOTE: Defaults (and invariants) can also be set in the Solr Request hander (solrconf.xml). In cases where one has control over the Solr configuration it is preferable to do so.