Spring Boot supports a wide range of configuration options including:
-
Command line Properties: defining properties as arguments (e.g.
--server.port=8080
) -
Application Properties: a Java properties file with the name
application.properties
-
Profile specific configurations: requires to
-
activate the profile by setting
spring.profiles.active
-
provide a
application-{profile}.properties
file
-
Configurations also support Placeholders (e.g. app.name=Smarti
and later app.description=${app.name} is a Spring Boot application
).
As alternative to Java properties file Spring Boot allows to use YAML
Mongo DB is used by Smarti to store client, configuration and conversations.
-
spring.data.mongodb.uri = mongodb://localhost/smarti
-
spring.data.mongodb.database = smarti
-
spring.data.mongodb.host = localhost
-
spring.data.mongodb.port = 27017
-
spring.data.mongodb.password =
-
spring.data.mongodb.username =
-
smarti.migration.mongo.script-home = /usr/share/smarti/scripts
location of the database-migration scripts. When using on of the provided start-scripts, the database migration will be carried out on startup.
No core component of Smarti does send mails. However in cases where modules want to send notifications per mail those are the properties used by Spring to configure the Mail Server to use.
-
spring.mail.host=
-
spring.mail.port=
-
spring.mail.protocol=smtp
-
spring.mail.username=
-
spring.mail.password=
-
security.config.enabled = true
- turn security (auth) off/on globally, e.g. for testing -
security.config.implementation = mongo
- authentication backend, currently onlymongo
is supported -
security.config.logout-redirect =
- redirect-target after successful logout. -
security.config.mongo.password-hasher = SHA-256
- password encryption algorithm -
security.config.mongo.enable-signup = false
- enable/disable public account-creation -
security.config.mongo.enable-password-recovery = false
- enable/disable password recovery workflow -
security.config.mongo.admin-password =
- set the default admin password: on startup, if there is no user with roleADMIN
, a new user is created using this password. If left blank, a random password will be generated and printed to the log.
SolrLib is a library that allows to use Solr in Spring. It supports embedded, standalone and cloud versions of Solr by adding the according modules.
-
solrlib.home = /absolute/path/to/solr/home
: The${SOLR_HOME}
directory, Used by embedded and standalone. NOTE: In case an embedded Solr Server is used it is highly recommended to setsolrlib.home
to the absolute path where the embedded Solr server will be located. Otherwise a new Solr core will be created in a temp folder on every startup. -
solrlib.base-url = http://localhost:8983/solr
: base-url for all solr requests. This will trigger using solrlib-standalone if available on the classpath - -
solrlib.zk-connection = zookeeper1:8121,zookeeper2:8121
: ZooKeeper connection string. This will trigger using solrlib-cloud if available on the classpath -
solrlib.collection-prefix =
prefix for the remote collection names, to avoid name-clashes on shared servers. Only used by standalone and cloud -
solrlib.max-shards-per-node = 1
: Only relevant in cloud-mode -
solrlib.deploy-cores = true
: option to disable automatic configuration update/deployment to remote servers. You might not have the karma to do so. Only used by standalone and cloud -
solrlib.delete-on-shutdown = false
: option to delete the solrlib-home upon shutdown. Only used by embedded
UI Cache
-
ui.cache.maxAge = 864000
: The maximum age of elements in the cache in seconds (default:864000
- 10 days). If< 0
no cache will be used. -
ui.cache.indexMaxAge = 3600
: The maximum age of a cachedindex.hmtl
(default:3600
- 1 hour). If< 0
no cache will be used.
CORS
-
cors.enabled = true
: allows to enable/disable Cors
Property Injection
Allows to inject backend properties in the Frontend
-
resource.transform.inject.text = constants*.js
Default Wbservice Error Handler
-
webservice.errorhandler.writetrace = false
: Note that even if disabled stacktraces for5**
responses will be logged.
Jsonp callback
-
jsonp.callback = callback
: The name of the callback
-
rocketchat.proxy.hostname =
-
rocketchat.proxy.port = 80
-
rocketchat.proxy.scheme = http
The Speak Service managed resource bundles for bot generated replay messages in conversations.
-
message.locale = de_DE
-
message.source =
Conversation are indexed in Solr managed by SolrLib
-
smarti.index.rebuildOnStartup = true
: Allows to enable/disable a full rebuild of indexes on startup -
smarti.index.conversation.commitWithin = 10000
: Defines the maximum time span until after published conversations are available in the index. Values are in M´milliseconds. For values< 0
the default10
seconds will be used. For values>= 0 < 1000
the minimum value of1000ms
will be used. -
smarti.index.conversation.message.merge-timeout = 30
: Multiple messages of the same users are merged to a single message if they where sent within the configured time period. Values are in Seconds. The default is30
seconds.
Configures the amount of system resources used by Smarti to do processing (mainly analysis)
-
smarti.processing.numThreads =
: The number of analysis threads can be configured by the default value is2
. For every thread one should preserve~500MByte
additional Java Heap space. For the best usage of CPU power the number of threads should be the same as the number of cores.
This section describes the configuration of the analysis workflow and the Analysis Components
Warning
|
For now Analysis configurations are global. No Client specific configuration is possible. |
Note
|
Client specific analysis configurations are planed for a future release. They will use the client configuration web service. |
-
smarti.analysis.language
: The language of conversations. If present language detection will be deactivated and the configured language is assumed for all conversations. NOTE: This configuration can be overruled by a client specific configuration set by the configuration service -
smarti.analysis.conextSize
: The size of the context (in number of messages) used for analysis. If a conversation includes more asconextSize
messeages only the lastconextSize
will be used for analysis. If a value⇐ 0
is configured all messages will be analysed.
Note
|
A higer conextSize results in longer processing times and slightly higher memory requirements. A small conextSize may result in lower quality of analysis results. Currently the minimum size of the context is 3 .
|
The analysis pipeline used to process conversations can be configured by the following properties
-
`smarti.analysis.pipeline.required = ` comma separated list of required analysis component (empty if none are required). If required components are missing the Analysis Service will not start
-
smarti.analysis.pipeline.optional = *,!keyword.interestingterms.conversation
: comma separated list of optional analysis component.-
comma separated list of names to explicitly define the processors to be used
-
*
to include all. If activated!{name}
can be used to exclude specific analysis components.
-
In additiona the smarti.analysis.language
allows to set the lanugage of conversation. If set this language will be used for conversations. If not present or empty the language will be detected based on the content.
Stanford NLP 3.8.0
can be used for NLP processing of German language texts. With no configuration in place it will use the default configuration as provided by the German model files of the Stanford distribution.
The following configuration properties allow to change the configuration:
-
nlp.stanfordnlp.de.posModel=
(default:edu/stanford/nlp/models/pos-tagger/german/german-hgc.tagger
) -
nlp.stanfordnlp.de.nerModel=
(default:edu/stanford/nlp/models/ner/german.conll.hgc_175m_600.crf.ser.gz
): Allows to include custom built NER models. Multiple models are separated by,
. -
nlp.stanfordnlp.de.parseModel=
(default:edu/stanford/nlp/models/lexparser/germanPCFG.ser.gz
): The default is a good tradeof on quality, memory and speed. Consideredu/stanford/nlp/models/srparser/germanSR.ser.gz
for lower memory footprint and higher speed. -
nlp.stanfordnlp.de.parseMaxlen=
(default:30
): Memory consumption increases with the square of this number.30
is ok for4g
heap.
For the parseModel
Stanford includes three models
-
PCFG Parser (
edu/stanford/nlp/models/lexparser/germanPCFG.ser.gz
): This is the default parser configured for Smarti. It runs fine with4g
of heap andparseMaxlen ⇐ 30
(parseMaxlen ⇐ 40
is borderline ok) -
Factored Parser (
edu/stanford/nlp/models/lexparser/germanFactored.ser.gz
): This is the default of Stanford NLP. It is the slowest and needs a lot of memory (especially withparseMaxlen > 30
). Do NOT use this parser with setups providing<8g
java heap (-Xmx8g
) -
Shift Reduce Parser (
edu/stanford/nlp/models/srparser/germanSR.ser.gz
): This is the fastest parser option available and recommended in situations where problems with the PCFG Parser occure. For more information see its documentation on Stanford NLP.
The maximal parse length (nlp.stanfordnlp.de.parseMaxlen
) defines the maximum number of tokens a sentence can have to be processed by the parser. As the memory requirement of the parser is roughly the square of the sentence length. More information are available in the parser FAQ.
The Interesting Terms component performs a keyword extraction. It uses tf-idf
of words in a document corpus to mark the most relevant words of a conversation. Implementation wise Solr is used to manage the text corpus and Solr MLT requests are used to retrieve relevant terms.
Their are several ways to configure Solr endpoints to be used for interesting terms.
-
keyword.solrmlt[0].name = my-corpus
: name suffix for the analysis component name. MUST BE unique. -
keyword.solrmlt[0].url = http://localhost:8983/solr/my-corpus
: The URL of the Solr endpoint -
keyword.solrmlt[0].field = text_gen
: The default field used in cases the lanugage is unknown or as fallback if no field is configured for the language of the conversation -
keyword.solrmlt[0].lang.{lang} = {field}
: The field to be used for{lang}
(e.g. for German:keyword.solrmlt[0].lang.de = text_de
)
The above configuration requires a Solr Server. To allow the use of embedded Solr Servers specific modules are required. Currenty two of those exist.
-
solrcore.wikipedia.de.resource =
Absolute path to the archive with the German Wikipedia Corpus. -
solrcore.crawl.systel.resource =
Absolute path to the archive with the crawl of Systel related Webpages
NOTE: The archives with the Solr cores are separate downloads. The cores are initialized on the embedded Solr server managed by SolrLib
The Token Filter allows to remove Keywords from a Conversation that are present in a configured stop-word list.
Module: token-processor
-
processor.token.stopword.default = {spring-resource}
: List of stop words used for any language (in addition to language specific stopwords) -
processor.token.stopword.{lang} = {spring-resource}
: list of stop words for the languagelang
.
where:
-
lang
is the 2 letter ISO language code (e.g.de
for German) -
spring-resource
are loaded as Spring Resource. Thereforeclasspath:
,file:
and URL resources (http(s):
,ftp:
) can be used. -
Stopword lists a text files with a single word per line. Empty lines and lines starting with
#
are ignored.
Hasso was a spefic use case of the predecessor of Smarti. The module hasso-vocabulary-extractor
provides two vocabulary based keyword extraction components.
-
smarti.extractor.synonyms.db =
CSV
file with;
as column separator andutf-8
as encoding. One vocabulary entry per row. The value in the first column is the preferred label. Additional columns for synonyms. The content is expected to be in German language. Extracted Entities will have the typeterm
and the tagdb-entity
. -
smarti.extractor.synonyms.sap =
CSV
file with,
as column separator andutf-8
as encoding. One vocabulary entry per row. The value in the first column is the preferred label. Additional columns for synonyms. The content is expected to be in German language. Extracted Entities will have the typeterm
and the tagsap-entity
Query Builder are configured per Client via the Client Configuration service. However a system wide default configuration can be used to initialize configurations for new clients.
This section includes configuration properties used to define the default configuration of query builders.
A SolrEndpoint is used by the generic Solr QueryProfider
provided by the query-solr
module.
Note
|
The configuration properties described in this section do NOT configure an actual Solr endpoint. They are only used as defaults for actual Solr Search Query Builder configurations. |
Prefix: query.solr
General Properties
-
query.solr.enabled = false
(type:Boolean
): The default state for new Solr Endpoint Configurations -
query.solr.solr-endpoint = http(s)://solr-host:port/solr/core-name
(type:String
): The URL of the Solr Endpoint (Solr Core)
Search Configuration
Properties with the prefix query.solr.search
define how the Solr query is build from Tokens extracted from the conversation
The default will search for location names and general token names in the default search field of Solr. All other options are deactivated. By setting the following properties those defaults for new Solr Endpoint configurations can be changed.
-
Title Search
-
query.solr.search.title.enabled = false
(type:Boolean
, default:false
): Title search is disabled by default -
query.solr.search.title.field = title
(type:String
): The name of the full text field arenull
orempty
to use the default search field
-
-
Full-Text Search
-
query.solr.search.full-text.enabled = true
(type:Boolean
, default:true
): Full text search is enabled by default -
query.solr.search.full-text.field =
(type:String
): The name of the full text field arenull
or empty to use the default field
-
-
Related Document Search
-
query.solr.search.related.enabled = false
(type:Boolean
, default:true
): If related Document search enabled -
query.solr.search.related.fields =
(type:List<String>
): The fields to use for search for similar documents
-
-
Spatial (Geo) Search
-
query.solr.search.spatial.enabled = true
(type:Boolean
, default:true
) -
query.solr.search.spatial.locationNameField =
(type:String
): The name of the field used to search for location names ornull
or empty to use the default field -
query.solr.search.spatial.latLonPointSpatialField
(type:String
): The name of the Solr field using alatLonPointSpatial
type to search for documents near a extracted location (with included lat/lon information) -
query.solr.search.spatial.rptField =
(type:String
): The name of the Solr field using arpt
type to search for documents near a extracted location (with included lat/lon information) -
query.solr.search.spatial.bboxField =
(type:String
): The name of the Solr field using abbox
type to search for documents near a extracted location (with included lat/lon information)
-
-
Temporal Search
-
query.solr.search.temporal.enabled = false
(type:Boolean
, default:false
): -
query.solr.search.temporal.timeRangeField =
(type:Boolean
, default:false
): The name of the Solr field using theDateRangeField
type used to search Documents near the extracted Date/Times or Date/Time ranges. -
query.solr.search.temporal.startTimeField =
(type:Boolean
, default:false
): The name of the Solr date field used to search for Documents near extracted Date/Times or the start time of extracted ranges. -
query.solr.search.temporal.endTimeField =
(type:Boolean
, default:false
): The name of the Solr date field used to search for Documents near end date of extracted ranges.
-
Result Configuration
Properties with the prefix query.solr.result
are used to define how results are processed. Most important the mappings define how to map fields in Solr documents to fields used in the UI.
Setting defaults for mappings is usefull if different cores do share the same or similar schema.xml
-
query.solr.result.mappings.title =
(type:String
): The title of the document -
query.solr.result.mappings.description =
(type:String
): The description to be shown for results -
query.solr.result.mappings.type =
(type:String
): the type of the document -
query.solr.result.mappings.doctype =
(type:String
): The document type of the document -
query.solr.result.mappings.thumb =
(type:String
): The thumbnail for the document -
query.solr.result.mappings.link =
(type:String
): The link pointing to the resource described by the document. -
query.solr.result.mappings.date =
(type:String
): The date of the document -
query.solr.result.mappings.source =
(type:String
): The source of the document
Solr Defaults Configuration
The prefix query.solr.defaults
properties can be used to set Solr Params that are included in all
Solr queries (e.g. to set the default field one can define query.solr.defaults.df = my-default-field
).
Typical examples include
-
query.solr.defautls.rows = 10
: This sets the number of results to10
-
query.solr.defautls.df = text
: the default search field totext
NOTE: Defaults (and invariants) can also be set in the Solr Request hander (solrconf.xml
). In cases where one has control over the Solr configuration it is preferable to do so.