Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

creating a new column does not work when using odbc::odbc() as a connection #5

Open
Wytzepakito opened this issue Mar 2, 2022 · 12 comments
Labels
bug Something isn't working

Comments

@Wytzepakito
Copy link
Collaborator

Wytzepakito commented Mar 2, 2022

library(dcmodify)
library(dcmodifydb)

# silly modification rules
m <- modifier( if (cyl == 6)  new_gear <- 10
               , gear[cyl == 4] <- 0  # this R syntax works too :-)
               , if (gear == 3) cyl <- 2
)
con <- dbConnect(odbc::odbc(),
                 driver = "PostgreSQL Unicode(x64)",
                 host = "localhost",
                 database = "testing_db",
                 port = 5432,
                 UID    = "#######",
                 PWD    = "######")
dbWriteTable(con, "mtcars", mtcars[,c("cyl", "gear")])
tbl_mtcars <- dplyr::tbl(con, "mtcars")

tbl_m <- modify(tbl_mtcars, m, copy=TRUE)

Yields an error in the SQL syntax: ERROR: syntax error at or near "2" <SQL> 'ALTER TABLE "dcmodifydb_2508036" ADD COLUMN "new_gear" 2;'

This is because of the usage of DBI::dbColumnInfo(rs) in alter_stmt(). If odbc::odbc is used as the connection, DBI::dbColumnInfo(rs) yields a dataframe like:

1  row_names   -1
2        mpg    6
3        cyl    2
4       disp    6
5         hp    6
6       drat    6
7         wt    6
8       qsec    6
9         vs    6
10        am    6
11      gear    2
12      carb    6
13      dnow   -1
14  new_gear    2

Instead the actual types as a character are needed in the second column

See:
issue1
issue2

@edwindj edwindj added the bug Something isn't working label Mar 2, 2022
@edwindj
Copy link
Member

edwindj commented Mar 2, 2022

Good that you are checking against a postgress db.
In general it is difficult to integrate db tests with CRAN packages, but I am thinking about adding a directory with a docker-compose setup where we can test different open source databases. Any thoughts on that?

@Wytzepakito
Copy link
Collaborator Author

I also checked it with a SQL server db, and the same issue arises.
Yeah, for the R versioning issue I was already thinking about working on creating a setup like that. I will start working on it.

@edwindj
Copy link
Member

edwindj commented Mar 10, 2022

Could you check if the issue is resolved using the latest version? (Today rather busy with other obligations)

@Wytzepakito
Copy link
Collaborator Author

Yes, I tested it with Postgres and odbc on my home pc and it seems to work. I still working on creating proper tests for all database connections using docker(I was sick for the main part of this week).
It does not work on SQL server!
The proper syntax creating a new column for a table in SQL server is:
ALTER TABLE #dcmodfiydb_1234567 ADD <column_name> <column_type>;
Instead of:
ALTER TABLE #dcmodfiydb_1234567 ADD COLUMN <column_name> <column_type>;

It solved the original problem in this issue but it might be a good idea to have automated tests for multiple SQL data types, which I'm still working on, before resolving the issue.

@edwindj
Copy link
Member

edwindj commented Mar 11, 2022

Nice! Removing COLUMN also seems to work for the other databases: made it a bit more flexible.

@Wytzepakito
Copy link
Collaborator Author

If a NA value is set for a new column the whole column turns into a character column on Postgres14.

m <- modifier( if (cyl == 6) {
  new_gear <- 10
} else if (cyl == 4) {
  new_gear <- 25
} else {
  new_gear <- NA
  }

)

dbWriteTable(con, "mtcars", mtcars[,c("cyl", "gear")])
tbl_mtcars <- dplyr::tbl(con, "mtcars")


tbl_m <- modify(tbl_mtcars, m, copy=FALSE)
#> # Source:   table<mtcars> [?? x 3]
#> # Database: postgres [postgres@localhost:/testing_db]
#>    row_names        cyl  gear new_gear
#>   <chr>          <dbl> <dbl> <chr>   
#>  1 Mazda RX4          6     4 10.0    
#>  2 Mazda RX4 Wag      6     4 10.0    
#>  3 Hornet 4 Drive     6     3 10.0    
#>  4 Valiant            6     3 10.0    
#>  5 Merc 280           6     4 10.0    
#>  6 Merc 280C          6     4 10.0    
#>  7 Ferrari Dino       6     5 10.0    
#>  8 Datsun 710         4     4 25.0    
#>  9 Merc 240D          4     4 25.0    
#> 10 Merc 230           4     4 25.0    
#> # ... with more rows

If the same column is overwritten then the type is set correctly.

@edwindj
Copy link
Member

edwindj commented Mar 23, 2022

Hmm, weird!
Curiuous to see the SQL output of above statement. Does the ALTER statement create a column of type TEXT?

@Wytzepakito
Copy link
Collaborator Author

Yes, I am unsure about the reason behind this as of yet.

>dcmodifydb::dump_sql(m, tbl_mtcars, con = con)

-- -------------------------------------
-- Generated with dcmodifydb, do not edit
-- dcmodify version: 0.1.9
-- dcmodifydb version: 0.3.0.9001
-- dplyr version: 1.0.8
-- dbplyr version: 2.1.1
-- from: 'command-line'
-- date: 2022-03-24
-- -------------------------------------


ALTER TABLE "mtcars"
ADD "new_gear" TEXT;

-- M1: 
-- 
-- R expression: if (cyl == 6) {
--     new_gear <- 10
-- } else if (cyl == 4) {
--     new_gear <- 25
-- } else {
--     new_gear <- NA
-- }
UPDATE "mtcars"
SET "new_gear" = 10.0
WHERE "cyl" = 6.0;

UPDATE "mtcars"
SET "new_gear" = 25.0
WHERE NOT("cyl" = 6.0) AND "cyl" = 4.0;

UPDATE "mtcars"
SET "new_gear" = NULL
WHERE NOT("cyl" = 6.0) AND NOT("cyl" = 4.0);

@edwindj
Copy link
Member

edwindj commented Mar 24, 2022

I added a test with this code for sqlite and duckdb and there it seems to work ok.
I suspect it has to do with the odbc connection. Just wondering if this also happens when you use a DBI::postgress driver instead of odbc.

Another thing you could try is to use NA_integer_ instead of NA, does that help? (I don't think that would be a nice fix, but just wondering if it solves this issue)

@Wytzepakito
Copy link
Collaborator Author

It also does this also using Rpostgres::Postgres() as a driver.

And setting the value to NA_integer_ does not solve the issue in both instances....

@edwindj
Copy link
Member

edwindj commented Mar 24, 2022

Hmm, what interested to see what the query is in alter_stmt.R.
What is the print out if you add:

 dplyr::show_query(tab)

at line 19 in alter_stmt.R (and do a devtools::load_all())

@Wytzepakito
Copy link
Collaborator Author

Wytzepakito commented Mar 28, 2022

The query made by alter_stmt.R at line 19 by dplyr::show_query(tab) is:

SELECT "cyl", "gear", NULL AS "new_gear"
FROM (SELECT *
FROM "mtcars"
LIMIT 2) "q01"
LIMIT 6

The call to dplyr::mutate at line 16 causes this it results in:

dplyr::mutate(tab, new_gear = 10, new_gear = 25, new_gear = NA_real_)

Which produces a table with the new_gear column being of type chr.
Removing the NA solves it.

If a dbplyr::memdb_frame() is used with the above dplyr::mutate call then the resulting column is of type SMALLINT, so it seems like an issue in that package...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants