BUGS

Note: only historical bugs are listed in this file.
For up-to-date buglist, see https://github.com/cosmos72/stmx/issues


KNOWN BUGS

see https://github.com/cosmos72/stmx/issues

32-bit CCL: "#<bogus object @ 0x...> is not of required type short-float" in retry-funs


FIXED BUGS:

7) fixed race condition in GV6/%UPDATE-STAT - it was also the cause of bug 6

6) with HW-TRANSACTIONS enabled, test suite no longer hangs - thanks to bugfix 7.

5) consistent reads were not fully guaranteed. The implementation allowed
   transactions to read inconsistent TVARs in some circumstances.
   Reason: when reading TVARs, version must be read twice (and depending on the compiler/CPU,
   other things are needed as well).
   See doc/consistent-reads.md for a full solution, which has been implemented.

4) initargs of transactional classes were wrapped in TVARs multiple times,
   depending on the length of list returned by (closer-mop:slot-definition-initargs slot)

3) when a transaction signals an error, (run-atomic) was calling (valid? log) without locking,
   so it could get spurious "invalid" answers and unnecessarily re-run the transaction
   (only a waste of resources, not a bug)
   but it could even get spurious "valid" answers in case read TVARs match the current values
   partially before another thread commits, and partially after.
   In the latter case, (run-atomic) would propagate the error to the caller
   => BUG, it must instead re-run the transaction.
   How to fix: when a transaction signals an error, also validate the log WITH locking.

2) (wait-once) was returning without removing the log from tvars
   waiting list. Each tvar would remove all the waiting logs when it
   notified it has changed, but unchanged vars could accumulate a LOT
   of enqueued logs, leaking memory.
   
   Solution: replaced tvar waiting queue with a hash-table, so now
   (wait-once) before returning explicitly removes its log from tvars
   waiting list.

1) (commit) could call (condition-notify) too early, before the relevant thread
   is sleeping in (condition-wait) inside (wait-once).
   
   Solution: since we cannot keep locks on tlogs while also locking
   tvars (DEADLOCK!), we added a flag 'prevent-sleep to tlog,
   and always read/write it while holding a lock on the tlog,
   
   then we do in (commit):
   (with-lock-held (lock-of log)
      (setf (prevent-sleep-of log) t)
      (condition-notify log (lock-of log)))
   
   and in (wait-once):
   (with-lock-held (lock-of log)
      (setf (prevent-sleep-of log) nil) ;; needed? should be the initial value
      
   ;; ... loop on (reads-of log) to enqueue on their waiting list,
   ;;     WHILE checking if they changed ...
   
   (with-lock-held (lock-of log)
      (unless (prevent-sleep-of log)
         (condition-wait log (lock-of log))))

NOT BUGS

4) (retry) triggers a call to (valid? log) without locking, so in theory it may
   get spurious "valid" answers exactly like in bug 3. Is it a bug or not?
   Not a bug.
   Reason: if tlog appears valid, (wait-tlog) will repeat validity check
   with locking before actually sleeping.