Compares source codes captured, hierarchy changes and domain changes; identifies Non-standard concepts used in concept set expressions
#install package
remotes::install_github("dimshitc/phenotypeChangeVocab")
library (dplyr)
library (openxlsx)
library (readr)
library (tibble)
library (PhenotypeChangesInVocabUpdate)
#set the BaseUrl of your Atlas instance
#baseUrl <- "https://yourSecureAtlas.ohdsi.org/"
# if security is enabled authorize use of the webapi
ROhdsiWebApi::authorizeWebApi(
baseUrl = baseUrl,
authMethod = "windows")
#specify cohorts you want to run the comparison for, in my example I import it from the CSV with one column containing cohortIds
#the example file is located in "~/PhenotypeChangesInVocabUpdate/extras/Cohorts.csv"
# also you can define the cohorts as vector directly:
#cohorts <-c(12822, 12824, 12825)
#you must specify the file location
cohortsDF <- readr::read_delim("~/PhenotypeChangesInVocabUpdate/extras/Cohorts.csv", delim = "\t", show_col_types = FALSE)
cohorts <-cohortsDF[[1]]
#excluded nodes is a text string with nodes you want to exclude from the analysis, it's set to 0 by default
# for example now some CPT4 and HCPCS are mapped to Visit concepts and we didn't implement this in the ETL,
#so we don't want these in the analysis (note, the tool doesn't look at the actual CDM, but on the mappings in the vocabulary)
#this way, the excludedNodes are defined in this way:
#excludedNodes <-"9201, 9202, 9203"
#set connectionDetails,
#you can use keyring to store your credentials,
#see how to configure keyring to use with the example below in ~/PhenotypeChangesInVocabUpdate/extras/KeyringSetup.R
# you can also define connectionDetails directly, see the DatabaseConnector documentation https://ohdsi.github.io/DatabaseConnector/
connectionDetails = DatabaseConnector::createConnectionDetails(
dbms = keyring::key_get("YourDatabase", "dbms" ),
connectionString = keyring::key_get("YourDatabase", "connectionString"),
user = keyring::key_get("YourDatabase", "username"),
password = keyring::key_get("YourDatabase", "password" )
)
newVocabSchema <-'vocab_schema_n1' #schema containing a new vocabulary version
oldVocabSchema <-'vocab_schema_n0' #schema containing an older vocabulary version
resultSchema <-'achilles_results' #schema containing Achilles results
#create the dataframe with concept set expressions using the getNodeConcepts function
Concepts_in_cohortSet<-getNodeConcepts(cohorts, baseUrl)
#resolve concept sets, compare the outputs on different vocabulary versions, write results to the Excel file
resultToExcel(connectionDetails = connectionDetails,
Concepts_in_cohortSet = Concepts_in_cohortSet,
newVocabSchema = newVocabSchema,
oldVocabSchema = oldVocabSchema,
resultSchema = resultSchema)
#open the excel file
#Windows
shell.exec("PhenChange.xlsx")
#MacOS
#system(paste("open", "PhenChange.xlsx"))
Writes an Excel file with a separate tab for each type of comparison.
"Node concept" is a concept directly used in Concept Set Expression
"includedescendants": indicates whether descendants of "Node concept" are included in concept set, 0 stands for False, 1 stands for True
"isexcluded": indicates whether "Node concept" and it's descendants if "includedescendants" = 1 are excluded from a concept set, 0 stands for False, 1 stands for True
"drc": descendant record count - summary number of
"source concept": the concept set definition is usually done through standard concepts.
Different clinical events might be captured with the same set of included standard concepts if mapping was changed, that's why the tool tracks source concepts related.
“Action”: flags whether concept or hierarchy branch is added or removed
sum of added or removed source concepts occurrences in a dataset
- for example, the cohort_id 123 doesn't pick up source codes X and Y when using newer vocabulary version. X appears 10 times in the data, Y appears 15 times.
In this situation you'll get the following output:
cohortid | 123 |
action | Removed |
sum | 25 |
lists non-standard concepts used in the concept set definition.
Note, the concept set definition JSON isn't updated with the vocabulary update, so you will not see concept changes in Atlas.
This way you need to run this tool to see if concepts changed to non-standard.
- For example, the cohort_id 10729 has conceptset =’Malignancies that spread to liver’ which has Node concept = "4324190|History of malignant neoplasm of breast" with descendants included,
this concept is non-standard and mapped this way:
Maps to "1340204|History of event"
Maps to value "4112853|Malignant tumor of breast".
In this situation you'll get the output below, which gives you the target concepts you need to use to capture the same clinical events while using a new vocabulary version.
cohortid | 10729 |
conceptsetname | Malignancies that spread to liver |
conceptsetid | 15 |
isexcluded | 0 |
includedescendants | 1 |
nodeConceptId | 4324190 |
nodeConceptName | History of malignant neoplasm of breast |
drc | 20284048 |
mapsToConceptId | 1340204 |
mapsToConceptName | History of event |
mapsToValueConceptId | 4112853 |
mapsToValueConceptName | Malignant tumor of breast |
Tab shows related source concepts that were added or removed. Mapping in both vocabulary versions is shown.
Note, source codes from the user's database only are included into the analysis.
This way the user knows why the difference in related source concepts occurs and might modify the concept set expression adding or removing mapped concepts.
- In the example below, events with ICD9CM “Neural hearing loss concept, unilateral” are now captured because of the mapping change. OLD_MAPPED_CONCEPT “Unilateral neural hearing loss” didn’t have the proper hierarchy, and wasn’t captured.
COHORTID | 12822 |
CONCEPTSETNAME | Cranial nerve disorder |
CONCEPTSETID | 28 |
ISEXCLUDED | 0 |
INCLUDEDESCENDANTS | 1 |
NODE_CONCEPT_ID | 441848 |
NODE_CONCEPT_NAME | Cranial nerve disorder |
SOURCE_CONCEPT_ID | 44823107 |
sourceCodesCount | 7115 |
ACTION | Added |
SOURCE_CONCEPT_NAME | Neural hearing loss, unilateral |
SOURCE_VOCABULARY_ID | ICD9CM |
SOURCE_CONCEPT_CODE | 389.13 |
OLD_MAPPED_CONCEPT_ID | 379831 |
OLD_MAPPED_CONCEPT_NAME | Unilateral neural hearing loss |
OLD_MAPPED_VOCABULARY_ID | SNOMED |
OLD_MAPPED_CONCEPT_CODE | 425601005 |
NEW_MAPPED_CONCEPT_ID | 381312 |
NEW_MAPPED_CONCEPT_NAME | Neural hearing loss |
NEW_MAPPED_VOCABULARY_ID | SNOMED |
NEW_MAPPED_CONCEPT_CODE | 73371001 |
Hierarchy change is reflected at "Peak concept" level, the common parent concept of added or removed standard concepts above which the hierarchy is changed.
- In the example below, the 375527|Headache disorder and all its descendants were added to the concept Headache concept set. This is quite a big change since drc (descendant record count)= 34219562, and now a researcher has to decide whether the new, more broad, definition fits well.
cohortid | 12825 |
conceptsetid | 23 |
conceptsetname | Headache |
isexcluded | 0 |
includedescendants | 1 |
nodeConceptId | 378253 |
nodeConceptName | Headache |
action | Added |
peakConceptId | 375527 |
peakName | Headache disorder |
peakCode | 230461009 |
drc | 34219562 |
This tab shows included concepts that changed their domain, so the different event table should be used.
- In the example below “2108163|Therapeutic apheresis; for plasma pheresis” concept changed its domain from Procedure to Measurement, so the concept set “Treatment or investigation for TMA” needs to be used with Measurement table as well to include the “2108163|Therapeutic apheresis; for plasma pheresis” events.
cohortid | 10656 |
conceptsetname | Treatment or investigation for TMA |
conceptsetid | 20 |
isexcluded | 0 |
includedescendants | 1 |
nodeConceptId | 4182536 |
nodeConceptName | Transfusion |
conceptId | 2108163 |
conceptName | Therapeutic apheresis; for plasma pheresis |
vocabularyId | CPT4 |
conceptCode | 36514 |
oldDomainId | Procedure |
newDomainId | Measurement |
drc | 1010478 |