Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

AresIndexer::augmentConceptFiles fails with "! object 'CONCEPT_ID' not found" #239

Open
lav-patel opened this issue Mar 2, 2023 · 6 comments

Comments

@lav-patel
Copy link

Describe the bug
Running the code from ares docs**

server  <- Sys.getenv("PGHOST_PGDATABASE")
user <- Sys.getenv("PGUSER")
password <- Sys.getenv("PGPASSWORD")

cdmVersion <- "5.4" 
cdmDatabaseSchema <- "omop_from_cdm"
resultsDatabaseSchema <- "omop_edc_results"
numThreads <- 4
sqlOnly <- FALSE
createIndices <- TRUE
outputFolder <- "output"
cdmSourceName <- "omop_from_cdm" # a human readable name for your CDM source
verboseMode <- FALSE # set to TRUE if you want to see activity written to the console
writeToTable <- TRUE # set to FALSE if you want to skip writing to a SQL table in the results schema
checkLevels <- c("TABLE", "FIELD", "CONCEPT") # which DQ check levels to run 
checkNames <- c() # which DQ checks to run?  # Names can be found in https://github.com/OHDSI/DataQualityDashboard/blob/main/inst/csv/OMOP_CDMv5.4_Check_Descriptions.csv
aresDataRoot <- "output/webserver_root/ares/data"

# run achilles
Achilles::achilles(cdmVersion = cdmVersion,
    connectionDetails = connectionDetails,
    cdmDatabaseSchema = cdmDatabaseSchema,
    resultsDatabaseSchema = resultsDatabaseSchema,
    #numThreads=numThreads,
    #sqlOnly = sqlOnly,
    #createIndices = createIndices
)
# obtain the data source release key (naming convention for folder structures)
releaseKey <- AresIndexer::getSourceReleaseKey(connectionDetails, cdmDatabaseSchema)
datasourceReleaseOutputFolder <- file.path(aresDataRoot, releaseKey)

# run data quality dashboard and output results to data source release folder in ares data folder
dqResults <- DataQualityDashboard::executeDqChecks(
    connectionDetails = connectionDetails,
    cdmDatabaseSchema = cdmDatabaseSchema,
    resultsDatabaseSchema = resultsDatabaseSchema,
    vocabDatabaseSchema = cdmDatabaseSchema,
    cdmVersion = cdmVersion,
    cdmSourceName = cdmSourceName,
    outputFile = "dq-result.json",
    outputFolder = datasourceReleaseOutputFolder
    #numThreads = numThreads,
    #sqlOnly = sqlOnly,
    # verboseMode = verboseMode,
    # writeToTable = writeToTable,
    # checkLevels = checkLevels,
    # checkNames = checkNames
)

# inspect logs
#ParallelLogger::launchLogViewer(logFileName = file.path(outputFolder, 
#                                                      sprintf("log_DqDashboard_%s.txt", cdmSourceName)))

# export the achilles results to the ares folder
Achilles::exportAO(
    connectionDetails = connectionDetails,
    cdmDatabaseSchema = cdmDatabaseSchema,
    resultsDatabaseSchema = resultsDatabaseSchema,
    vocabDatabaseSchema = cdmDatabaseSchema,
    outputPath = aresDataRoot
)

# perform temporal characterization
outputFile <- file.path(datasourceReleaseOutputFolder, "temporal-characterization.csv")
Achilles::performTemporalCharacterization(
    connectionDetails = connectionDetails,
    cdmDatabaseSchema = cdmDatabaseSchema,
    resultsDatabaseSchema = resultsDatabaseSchema,
    outputFile = outputFile
)

# augment concept files with temporal characterization data
AresIndexer::augmentConceptFiles(releaseFolder = file.path(aresDataRoot, releaseKey))

To Reproduce
Steps to reproduce the behavior:

  1. install the mentioned R packages in the below section
  2. and run the above R code

Expected behavior
should not have got following error:

Error in `count()`:
ℹ In argument: `CONCEPT_ID`.
Caused by error:
! object 'CONCEPT_ID' not found
Run `rlang::last_error()` to see where the error occurred.
> rlang::last_error()
<error/dplyr:::mutate_error>
Error in `count()`:
ℹ In argument: `CONCEPT_ID`.
Caused by error:
! object 'CONCEPT_ID' not found
---
Backtrace:
  1. AresIndexer::augmentConceptFiles(...)
  4. dplyr:::count.data.frame(., CONCEPT_ID, tolower(CDM_TABLE_NAME))
  6. dplyr:::group_by.data.frame(x, ..., .add = TRUE, .drop = .drop)
  7. dplyr::group_by_prepare(.data, ..., .add = .add, error_call = current_env())
  8. dplyr:::add_computed_columns(.data, new_groups, error_call = error_call)
  9. dplyr:::mutate_cols(...)
 11. dplyr:::mutate_col(dots[[i]], data, mask, new_columns)
 12. mask$eval_all_mutate(quo)
 13. dplyr (local) eval()
Run `rlang::last_trace()` to see the full context.

Desktop (please complete the following information):

  • OS: ubunu22
    no browser related issue.
    Following packages as of March 2 2023
RUN Rscript -e "remotes::install_github('OHDSI/DatabaseConnector',ref='v5.1.0')" 
RUN Rscript -e "remotes::install_github('OHDSI/Achilles')"
RUN Rscript -e "remotes::install_github('OHDSI/DataQualityDashboard')"
RUN Rscript -e "install.packages('shiny')"
RUN Rscript -e "install.packages('dt')"
RUN Rscript -e "install.packages('DT')"
RUN Rscript -e "remotes::install_github('OHDSI/AresIndexer')"
RUN Rscript -e "remotes::install_github('OHDSI/Castor')"
@alabarga
Copy link

I'm getting the same error, this worked fine a couple of months ago. Maybe changes in Achilles or DataQualityDashboard outputs?

@alabarga
Copy link

the AresIndexer::buildNetworkIndex() call does not work either


Error in `dplyr::select()`:
! Can't subset columns that don't exist.
✖ Column `CheckResults.EXECUTION_TIME` doesn't exist.

@alabarga
Copy link

maybe is related to this? #199

@alabarga
Copy link

using

remotes::install_github('ohdsi/[email protected]')
remotes::install_github('OHDSI/[email protected]', force=TRUE)

I manage to run AresIndexer::augmentConceptFiles(releaseFolder = file.path(aresDataRoot, releaseKey))

with warning


Warning message:
There was 1 warning in `filter()`.
ℹ In argument: `!is.na(results$CONCEPT_ID) && results$FAILED == 1`.
Caused by warning in `!is.na(results$CONCEPT_ID) && results$FAILED == 1`:
! 'length(x) = 2970 > 1' in coercion to 'logical(1)' 

I also manage to run

AresIndexer::buildNetworkIndex(sourceFolders = sourceFolders, outputFolder = aresDataRoot)
AresIndexer::buildDataQualityIndex(sourceFolders = sourceFolders, outputFolder = aresDataRoot)

but

AresIndexer::buildNetworkUnmappedSourceCodeIndex(sourceFolders = sourceFolders, outputFolder = aresDataRoot)

fails with


> AresIndexer::buildNetworkUnmappedSourceCodeIndex(sourceFolders = sourceFolders, outputFolder = aresDataRoot)
Error in `group_by()`:
! Must group by variables found in `.data`.
Column `CDM_TABLE_NAME` is not found.
Column `CDM_FIELD_NAME` is not found.
Column `SOURCE_VALUE` is not found.
Run `rlang::last_error()` to see where the error occurred.


Hope it helps!

cc @clairblacketer

@alabarga
Copy link

also not all concepts seems to have been exported

image

when are the data/Synthea/20230216/concepts/measurement/concept_3015182.json etc files created? with the Achilles::exportAO() call?

@alabarga
Copy link

maybe @fdefalco can help!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants