Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow empty String ranges #456

Merged
merged 7 commits into from
Mar 19, 2025
Merged

Allow empty String ranges #456

merged 7 commits into from
Mar 19, 2025

Conversation

daniel-raffler
Copy link
Contributor

Hello,
I've noticed that there is a precondition the char override for StringFormulaManager.range:

* @return formula denoting the range regular expression over two chars.
* @see #range(StringFormula, StringFormula)
*/
default RegexFormula range(char start, char end) {
  Preconditions.checkArgument(
      start <= end,
      "Range from start '%s' (%s) to end '%s' (%s) is empty.",
      start,
      (int) start,
      end,
      (int) end);
  return range(makeString(String.valueOf(start)), makeString(String.valueOf(end)));
}

However, the JavaDoc for the (general) range method does not mention this restriction:

/**
 * @return formula denoting the range regular expression over two sequences of length 1.
 */
RegexFormula range(StringFormula start, StringFormula end);

And, according to SMTLIB, empty ranges should be fine:

; RE range
; (re.range s₁ s₂) is the set of all *singleton* strings s such that
; (str.<= s₁ s s₂) provided s₁ and s₂ are singleton. Otherwise, it 
; is the empty language.
(re.range String String RegLan)

I've tried it out with different solvers, and only CVC4 seems to cause issues when the precondition in the default method is simply removed. CVC5 and Princess still assume that the bounds are single character Strings and will throw an exception otherwise. However, they can handle the case when the lower bound is larger than the upper bound and simply return the empty language. Z3 supports both cases, and will also return the empty language when one of the bounds is not a single character String.

In this PR I removed the precondition from the default override to make sure it matches the behavior of the general method. I also added some checks to prevent crashes on CVC5 and Princess if either of the bounds is not a single character Strings, and fixed two bugs for Princess. Note that there are still some issues on CVC4, but this will be hard to fix as the solver is no longer maintained.

…Manager.range

The precondition checks if the lower bound is smaller or equal to the upper bound. However, in SMTLIB re.range is defined for all inputs and will return the empty language if the interval is empty. Since both versions of StringFormulaManager.range() should behave the same, we remove the precondition.
… from crashing on empty range intervals

Note that we can't fix the issue for CVC4 as the solver will throw an exception, even when we use an ITE expression to make sure that the range is non-empty
…handle ITE expressions that return String values
Copy link
Member

@kfriedberger kfriedberger left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is good to match our API against the SMTLIB standard.
Looks good for me.

Copy link
Member

@kfriedberger kfriedberger left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm.

@daniel-raffler daniel-raffler merged commit 3a6bce9 into master Mar 19, 2025
2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Development

Successfully merging this pull request may close these issues.

2 participants