Dealing with formula #339

devSJR · 2025-03-13T21:36:19Z

Hello. It would be great if tinyplot can deal with more complex formulas because it behaves different from base plot. I think this is on the to-do list. I just wanted to put it here

mtcars

par(mfrow = c(1,2))
plt(mpg/hp ~ wt, data = mtcars)
plot(mpg/hp ~ wt, data = mtcars)

grantmcdermott · 2025-03-13T23:06:55Z

Thanks @devSJR. As you probably know, the workaround here is to wrap with I()...

plt(I(mpg/hp) ~ wt, data = mtcars)

... but I certainly agree that consistency with vanilla plot would be desirable if we can achieve it.

zeileis · 2025-03-13T23:19:05Z

The fact that tinyplot handles much more complex formulas than base plot is the reason for this deviation. Internally, tinyplot converts a formula like y ~ x | a + b into ~ y + x + a + b in order to first set up a model frame with all variables and then extracts and processes the building blocks ~ y, ~ x, and ~ a + b.

If the blocks for y and x contain more than one variable, then an informative warning is issued, e.g.,

plt(mpg/hp ~ wt, data = mtcars)
## Warning message:
## In tinyplot.formula(mpg/hp ~ wt, data = mtcars) :
##   formula should specify at most one y-variable, using: mpg

This is done the same way for the x variable:

plt(mpg ~ wt/hp, data = mtcars)
## Warning message:
## In tinyplot.formula(mpg ~ wt/hp, data = mtcars) :
##   formula should specify exactly one x-variable, using: wt

So tinyplot is consistent here between the left-hand side and right-hand side. In contrast, base plot is not consistent:

On the left-hand side mpg/hp has its arithmetic meaning: mpg divided by hp.
On the right-hand side wt/hp has its symbolic formula meaning: mpg nested in hp which here gets translated into a sequence of two plots mpg ~ wt and mpg ~ hp.

I think that this behavior of base R is inconsistent and very confusing. Of course, it would be possible to mimic this behavior but I don't think we should do it. Instead we should be consistent in the processing of the left-hand side and right-hand side. Either by warning about this situation (current solution) or by using the arithmetic meaning for both the y and the x variable.

The latter would create another inconsistency, though, namely that + would have to be handled differently in the x and the by part, e.g., y ~ x1 + x2 | a + b. For the x part we would then use the arithmetic meaning (x1 plus x2) but for by part the symbolic formula meaning (two variables a and b).

So to cut a long story short: My opinion is that tinyplot's current solution is the only one that is consistent and not confusing. If the users want to use operators with their arithmetic meaning, they need to insulate them, e.g., via I(), both on the left-hand side and the right-hand side. Thus, you can use I(mpg/hp) ~ wt or mpg ~ I(wt/hp) which both work exactly the same in tinyplot and in base R.

zeileis · 2025-03-13T23:36:23Z

Some additional side remarks (aka rant). Feel free to ignore this, it's not really related to formulas for plotting:

The formula support (added with S3 in the white book) is an incredibly powerful feature for doing statistics and it's great to have it wired into the base language.
However, the choice that model.frame() et al. would keep the arithmetic meaning of operators on the left-hand side but not on the right-hand side is just wrong IMO.
Why can I use lm(y ~ x1 + x2) to specify a model with two regressors but have to use lm(cbind(y1, y2) ~ x) rather than lm(y1 + y2 ~ x) for a model with two dependent variables?
This shortcoming also means that I cannot easily specify two factor response variables (say for a bivariate probit model) because cbind(y1, y2) whould drop the factor attributes and data.frame(y1, y2) is not allowed in a formula processed with model.frame().

zeileis · 2025-03-14T00:01:40Z

If we want to disable the symbolic interpretation of the formula operators in tinyframe() we can do so as follows:

tinyframe = function(formula, data, drop = FALSE, symbolic = TRUE) {
  ## input
  ## - formula: (sub-)formula
  ## - data: model.frame from full formula
  if (is.null(formula)) return(NULL)
  if (symbolic) {
    names = sapply(attr(terms(formula), "variables")[-1L], deparse, width.cutoff = 500L)
  } else {
    rhs = formula[[2L]]
    names = deparse(rhs, width.cutoff = 500L)
    data[[names]] = with(data, eval(rhs))
  }
  data[, names, drop = drop]
}

The default symbolic = TRUE is the behavior we have up to now:

d <- data.frame(a = 1:3, b = 3:1)
tinyframe(~ a + b, data = d)
##   a b
## 1 1 3
## 2 2 2
## 3 3 1

But then we can switch to symbolic = FALSE:

tinyframe(~ a + b, data = d, symbolic = FALSE)
##   a + b
## 1     4
## 2     4
## 3     4

Thus, with that modification we could turn off the symbolic interpretation of the y and/or x part of the formula inside tinyplot.formula.

So the changes to the code are really minimal and with can easily implement any of the following strategies:

Current tinyplot behavior: Use symbolic = TRUE for both y and x. This forces users to use I() on both the left-hand side and right-hand side.
Mimic base plot: Use symbolic = TRUE for x but symbolic = FALSE for y.
Force single y and x variable: Use symbolic = FALSE in both x and y.

My personal preference is (1) >> (3) > (2). But if Grant and/or Vincent clearly prefer consistency with base R, I'm also willing to implement that 😜

…ymbolic interpretation of formula parts (discussed in #339)

devSJR · 2025-03-14T07:19:41Z

I started my morning with a coffee, an additional side remarks (aka rant) and finished with a smile.
You rock!
Keep up the good work.

I think a small section in the vignette (even verbatim of this discussion here) would do the trick.

zeileis · 2025-03-14T07:29:17Z

Thanks for the nice words - and thanks for raising the issue in the first place!

After some sleep, I also don't file quite as strongly about strategy 2. So if Grant decides that he wants to go with consistency with base R, I'm also very fine with that 😇

grantmcdermott · 2025-03-14T15:30:11Z

I defer to to @zeileis on all matters related to formulae!

In seriousness, this is a great discussion with excellent points. The base inconsistency of / across the formula lhs and rhs is a killer point... as is the cbind() versus + point, which I too have been much frustrated by in the past.

Summarizing... @zeileis I think you make a perfectly compelling case that we should leave the current behaviour as-is, perhaps with some additional documentation or an example (as suggested by @devSJR). I can see that you've opened a branch that would enable users to select into the alternative symbolic = FALSE behaviour, so perhaps that's the place to do it. I also agree that (1) > (3) > (2), so let's ensure that symbolic = FALSE turns off symbolic behaviour for both x and y. Better to be internally consistent than follow "for compatibility with S" legacy behaviour IMO :-)

devSJR · 2025-03-14T15:36:34Z

I will keep watching this. I guess more users will come across this, and they will certainly be able to find the symbolic parameter.

Moreover, examples in the example section are always great. Actually, they are better than the vignette! I tell my students to go the examples first. I guess others do the same.

zeileis · 2025-03-14T16:12:55Z

OK, good, let's stick with the current behavior then. We could also expand the error message, e.g.,

formula should specify at most one y-variable, using: mpg
if you want to use arithmetic operators, make sure to wrap them inside I()

or something along those lines?

As for tinyframe(..., symbolic = FALSE). My thought was that I would implement the option in tinyformula() so that we don't forget and can choose to use it or not. I wouldn't export the symbolic argument to the user, though. We should set (or not set) this internally.

However, in the meantime I thought of one problem that the current implementation does not cover: arithmetic operators in combination with other functions, e.g., log(y1) + y2 ~ x. This is admittedly contrived but it would necessitate a different implementation.

So if we stick to the current behavior, I would probably just discard the tinyframe-symbolic branch.

grantmcdermott · 2025-03-16T22:59:22Z

So if we stick to the current behavior, I would probably just discard the tinyframe-symbolic branch.

I must confess I haven't been keeping track of all the ins and outs here, and I desparately need to switch to "day job" work now as I've got a bunch of looming deadlines.

@zeileis my feeling is that I'm happy with whatever you think is best. So I'll leave you to close this issue or resolve in whatever fashion you feel is most appropriate.

zeileis added a commit that referenced this issue Mar 14, 2025

add symbolic = TRUE argument that optionally allows to turn off the s…

adad21d

…ymbolic interpretation of formula parts (discussed in #339)

zeileis mentioned this issue Mar 17, 2025

Dealing with y1/y2 ~ x formulas #341

Merged

grantmcdermott closed this as completed in #341 Mar 17, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Dealing with formula #339

Dealing with formula #339

devSJR commented Mar 13, 2025

grantmcdermott commented Mar 13, 2025

zeileis commented Mar 13, 2025

zeileis commented Mar 13, 2025

zeileis commented Mar 14, 2025

devSJR commented Mar 14, 2025 •

edited

Loading

zeileis commented Mar 14, 2025 •

edited

Loading

grantmcdermott commented Mar 14, 2025 •

edited

Loading

devSJR commented Mar 14, 2025

zeileis commented Mar 14, 2025

grantmcdermott commented Mar 16, 2025

Dealing with formula #339

Dealing with formula #339

Comments

devSJR commented Mar 13, 2025

grantmcdermott commented Mar 13, 2025

zeileis commented Mar 13, 2025

zeileis commented Mar 13, 2025

zeileis commented Mar 14, 2025

devSJR commented Mar 14, 2025 • edited Loading

zeileis commented Mar 14, 2025 • edited Loading

grantmcdermott commented Mar 14, 2025 • edited Loading

devSJR commented Mar 14, 2025

zeileis commented Mar 14, 2025

grantmcdermott commented Mar 16, 2025

devSJR commented Mar 14, 2025 •

edited

Loading

zeileis commented Mar 14, 2025 •

edited

Loading

grantmcdermott commented Mar 14, 2025 •

edited

Loading