-
-
Notifications
You must be signed in to change notification settings - Fork 151
Bank transaction imports
In LedgerSMB versions 1.3+, there's improved bank reconciliation functionality. This functionality depends on a working import function for bank statements. Additionally, it depends on all transactions having been entered before the reconciliation starts -- or that additional transactions get entered during reconciliation. However, many users have indicated that the process of entering all transactions and payments is a labour intensive process which could do with automation support.
From discussion it seems that basic components of a workflow to import transactions are in place or should have been in place in LedgerSMB:
- Parsing bank statements into "meaningful transaction data"
- Matching the transaction data with existing payments
- Matching remaining transactions with remaining open items to generate payments
- Matching unmatched transactions with template transactions
(2) is in place for the bank reconciliation. The bank reconciliation functionality is probably under-utilized, as (1) is available only as custom code (for which there's very little documentation), very few installations are likely to use this functionality. (3) is new functionality that needs to be realized completely.
As a generic implementation for step (1) is missing at the moment, the first step is to draft a design filling this gap. The input should be translated by a parser to output to be used by steps (2) and (3) to reconcile and/or create transactions. (Note that creating transactions is more demanding than "just" reconciling -- as such, more output is required than is currently the case.)
A parser will transform the input(s) according to the definitions below. A parser-instance will be defined for a "format"; a format being a type of input (CSV, XML, ...) including its associated configuration.
Each (bank)account will be associated with least one parser-instance (=parser+configuration).
The parser transforms the input (file or stream) into an array of hashes with the following keys:
- amount (negative for debit/positive for credit)
- currency
- book date
- value date
- description
- bank account of the counterparty
- 'id's provided by a payment processor
The parser is provided the following data upon input:
- bank account number as filter
- file or handle/stream to read from (or an array thereof)
Most users will be greatly facilitated by having a CSV/TXT/TAB separated parser. Looking a bit more into the future, a large ISO-20022 migration is upcoming for SWIFT (payment) processing. As such, it's likely the ISO-20022 format will make the application future proof. As much less configuration is likely to be required (due to the standardized format), supporting this format will simplify setup for users.
The CSV parser will support the following configuration items:
- field_separator: "\t", ",", ";", ...
- decimal_separator: '.', ","
- thousands_separator: ",", '.'
- has_headers: true/false
- column_names: [ 'col_1', 'col_2', 'amount', ...]
- debit_credit_indicator: '', { 'debit': 'DT', 'credit': 'CR' }
- column_map: { ... }
Note that the column_names
configuration is only required when has_headers
is false. Also note that the debit_credit_indicator
isn't required when the amount
output has the correct signs in the source data.
The column map serves to map the columns in the input (presumably provided by the headers in the source file) to the keys expected in the output.
The XML parser will have the following configuration items:
- XPath mapping of the fields in the XML to the output fields
- debit_credit_indicator: '', { 'debit': 'DT', 'credit': 'CR' }
Based on the configuration of the parser, post-process the input stream to correct the sign of the amount
value.
Today, the procedure works as follows:
- Estimate which transactions of the same
transdate
on the payments account can be grouped into "estimated payments", split by type of transaction (ar
,ap
,gl
),memo
description and payment batch ("voucher")- group transaction lines with the same
source
field value (inacc_trans
), or - lacking a source, group lines with the same
entity_credit_account
(in case ofar
orap
records) orreference
(in case of agl
record)
- group transaction lines with the same
- Add the estimated payments to the reconciliation set as "transactions to-be-reconciled"
- One-by-one add the lines of the bank statement to the reconciliation data set (the output of [1]):
- If the input source number (
scn
) is a number only, prefix it with the check-prefix - If the (prefixed) source number is provided:
- Find the number of to-be-reconciled transactions for which the
transdate
andsource
match the 'posting date' and (prefixed) 'scn' from the bank account [original-set] - If the count equals zero (0), create an additional 'to be reconciled' record
- If the count equals one (1), update the 'to be reconciled' record, marking it 'cleared'
- If the count is higher, do these additional steps:
- Find the number of to-be-reconciled transactions for which the
transdate
,source
andamount
match the values from the bank account - If the count equals zero (0), find the first match of the original-set (ordered by amount)
and mark the identified record as 'cleared' - If the count equals one (1), mark the identified record as 'cleared'
- If the count is higher, select the first of the records found and mark that one as 'cleared'
- Find the number of to-be-reconciled transactions for which the
- Find the number of to-be-reconciled transactions for which the
- If the (prefixed) source isn't provided:
- Find the number of to-be-reconciled transactions for which
transdate
andamount
match the bank-provided data and thesource
doesn't start with the check-prefix - If the count equals zero (0), create an additional 'to be reconciled' record
- If the count equals one (1), update that record, marking it 'cleared'
- If the count is higher, select the first matching record and mark that one as 'cleared'
- Find the number of to-be-reconciled transactions for which
- If the input source number (
TODO
A pre-condition for this procedure should be that any transactions that have been reversed or are themselves reversals, should be excluded from the procedure matching payments/bank account transactions with the bank data.
Another pre-condition is that we decide how payments are supposed to be corrected or how withheld bank charges should be incorporated into "additional postings" adding to payments.
Options:
- Configurable rule-engine applying matching criteria in order of relevance
- Pre-coded matching with bayes statistics learning the algorithm which equalities/similarities are more important than others (are being overridden more often than others)
TODO
The process of matching the remaining transactions with open items (with the goal to create payments) is one that likely requires either explicit matching rules. Other options to map inputs to available items could include Bayesian statistics (like SpamAssassin).
For now, we consider these two options:
- Explicitly configured, rule-based matching
- Bayesian matching, derived from user classification
This kind of matching would require extensive configuration, linking a wide range of possible inputs from the bank statement to a set of open items (for which values such as PO #, order # and invoice # differ for each one).
To support this kind of functionality, likely some kind of rule-definition means needs to be developed with an engine being able to execute these rules for each combination of input lines and rules and open items.
While this mechanism is assumed to need a lot less configuration (explicit), research is required from the LedgerSMB development team regarding tokenisation of bank account inputs and association of statistics with different (classes of) open items. [Note that these classes are hard to define during development; they're likely dependent on the use-case.]
While there's little configuration to be created explicitly, there's a lot of statistical data to be stored, to be used on the next iteration of reconciliation.
NOTE[1]: What to do when there's a benefit to sharing the statistics across companies?
NOTE[2]: A design is needed for storing and updating the statistical data.
NOTE[3]: Statistical research is required to determine on what level statistics need to be stored (do we need to store stats per counterparty? If not, what other level of granularity makes sense? ECA?)
The process takes as its input one line of the payment input file and looks up from the database all open items before the value/book date of the input line.
The process produces one or more eligible open items for which the input could be a payment. The output will be handled by an operator; in case of multiple eligible items, the operator will select an appropriate one -- in case of a single one, the match will need to be reviewed.