Utility to compare cohorts run in different vocabulary versions by resolving their concept sets

Compares source codes captured, hierarchy changes and domain changes; identifies Non-standard concepts used in concept set expressions

Step by Step Example

#install package
remotes::install_github("dimshitc/phenotypeChangeVocab")

library (dplyr)
library (openxlsx)
library (readr)
library (tibble)
library (PhenotypeChangesInVocabUpdate)

#set the BaseUrl of your Atlas instance
#baseUrl <- "https://yourSecureAtlas.ohdsi.org/"

# if security is enabled authorize use of the webapi
ROhdsiWebApi::authorizeWebApi(
  baseUrl = baseUrl,
  authMethod = "windows")

#specify cohorts you want to run the comparison for, in my example I import it from the CSV with one column containing cohortIds
#the example file is located in "~/PhenotypeChangesInVocabUpdate/extras/Cohorts.csv"
# also you can define the cohorts as vector directly:
#cohorts <-c(12822, 12824, 12825)

#you must specify the file location
cohortsDF <- readr::read_delim("~/PhenotypeChangesInVocabUpdate/extras/Cohorts.csv", delim = "\t", show_col_types = FALSE)
cohorts <-cohortsDF[[1]]

#excluded nodes is a text string with nodes you want to exclude from the analysis, it's set to 0 by default
# for example now some CPT4 and HCPCS are mapped to Visit concepts and we didn't implement this in the ETL,
#so we don't want these in the analysis (note, the tool doesn't look at the actual CDM, but on the mappings in the vocabulary)
#this way, the excludedNodes are defined in this way:
#excludedNodes <-"9201, 9202, 9203"


#set connectionDetails,
#you can use keyring to store your credentials,
#see how to configure keyring to use with the example below in ~/PhenotypeChangesInVocabUpdate/extras/KeyringSetup.R

# you can also define connectionDetails directly, see the DatabaseConnector documentation https://ohdsi.github.io/DatabaseConnector/

connectionDetails = DatabaseConnector::createConnectionDetails(
  dbms = keyring::key_get("YourDatabase", "dbms" ),
  connectionString = keyring::key_get("YourDatabase", "connectionString"),
  user = keyring::key_get("YourDatabase", "username"),
  password = keyring::key_get("YourDatabase", "password" )
)

newVocabSchema <-'vocab_schema_n1' #schema containing a new vocabulary version
oldVocabSchema <-'vocab_schema_n0' #schema containing an older vocabulary version
resultSchema <-'achilles_results' #schema containing Achilles results

#create the dataframe with concept set expressions using the getNodeConcepts function
Concepts_in_cohortSet<-getNodeConcepts(cohorts, baseUrl)

#resolve concept sets, compare the outputs on different vocabulary versions, write results to the Excel file
resultToExcel(connectionDetails = connectionDetails,
              Concepts_in_cohortSet = Concepts_in_cohortSet,
              newVocabSchema = newVocabSchema,
              oldVocabSchema = oldVocabSchema,
              resultSchema = resultSchema)

#open the excel file
#Windows
shell.exec("PhenChange.xlsx")

#MacOS
#system(paste("open", "PhenChange.xlsx"))

The output description:

Writes an Excel file with a separate tab for each type of comparison.

Definitions/column names used:

"Node concept" is a concept directly used in Concept Set Expression

"includedescendants": indicates whether descendants of "Node concept" are included in concept set, 0 stands for False, 1 stands for True

"isexcluded": indicates whether "Node concept" and it's descendants if "includedescendants" = 1 are excluded from a concept set, 0 stands for False, 1 stands for True

"drc": descendant record count - summary number of

"source concept": the concept set definition is usually done through standard concepts.

Different clinical events might be captured with the same set of included standard concepts if mapping was changed, that's why the tool tracks source concepts related.

“Action”: flags whether concept or hierarchy branch is added or removed

The Excel file has the following tabs:

1. summaryTable

sum of added or removed source concepts occurrences in a dataset

for example, the cohort_id 123 doesn't pick up source codes X and Y when using newer vocabulary version. X appears 10 times in the data, Y appears 15 times.

In this situation you'll get the following output:

cohortid	123
action	Removed
sum	25

2. nonStNodes

lists non-standard concepts used in the concept set definition.

Note, the concept set definition JSON isn't updated with the vocabulary update, so you will not see concept changes in Atlas.

This way you need to run this tool to see if concepts changed to non-standard.

For example, the cohort_id 10729 has conceptset =’Malignancies that spread to liver’ which has Node concept = "4324190|History of malignant neoplasm of breast" with descendants included,

this concept is non-standard and mapped this way:

Maps to "1340204|History of event"

Maps to value "4112853|Malignant tumor of breast".

In this situation you'll get the output below, which gives you the target concepts you need to use to capture the same clinical events while using a new vocabulary version.

cohortid	10729
conceptsetname	Malignancies that spread to liver
conceptsetid	15
isexcluded	0
includedescendants	1
nodeConceptId	4324190
nodeConceptName	History of malignant neoplasm of breast
drc	20284048
mapsToConceptId	1340204
mapsToConceptName	History of event
mapsToValueConceptId	4112853
mapsToValueConceptName	Malignant tumor of breast

3. mapDif

Tab shows related source concepts that were added or removed. Mapping in both vocabulary versions is shown.

Note, source codes from the user's database only are included into the analysis.

This way the user knows why the difference in related source concepts occurs and might modify the concept set expression adding or removing mapped concepts.

In the example below, events with ICD9CM “Neural hearing loss concept, unilateral” are now captured because of the mapping change. OLD_MAPPED_CONCEPT “Unilateral neural hearing loss” didn’t have the proper hierarchy, and wasn’t captured.

COHORTID	12822
CONCEPTSETNAME	Cranial nerve disorder
CONCEPTSETID	28
ISEXCLUDED	0
INCLUDEDESCENDANTS	1
NODE_CONCEPT_ID	441848
NODE_CONCEPT_NAME	Cranial nerve disorder
SOURCE_CONCEPT_ID	44823107
sourceCodesCount	7115
ACTION	Added
SOURCE_CONCEPT_NAME	Neural hearing loss, unilateral
SOURCE_VOCABULARY_ID	ICD9CM
SOURCE_CONCEPT_CODE	389.13
OLD_MAPPED_CONCEPT_ID	379831
OLD_MAPPED_CONCEPT_NAME	Unilateral neural hearing loss
OLD_MAPPED_VOCABULARY_ID	SNOMED
OLD_MAPPED_CONCEPT_CODE	425601005
NEW_MAPPED_CONCEPT_ID	381312
NEW_MAPPED_CONCEPT_NAME	Neural hearing loss
NEW_MAPPED_VOCABULARY_ID	SNOMED
NEW_MAPPED_CONCEPT_CODE	73371001

4.peakDif

Hierarchy change is reflected at "Peak concept" level, the common parent concept of added or removed standard concepts above which the hierarchy is changed.

In the example below, the 375527|Headache disorder and all its descendants were added to the concept Headache concept set. This is quite a big change since drc (descendant record count)= 34219562, and now a researcher has to decide whether the new, more broad, definition fits well.

cohortid	12825
conceptsetid	23
conceptsetname	Headache
isexcluded	0
includedescendants	1
nodeConceptId	378253
nodeConceptName	Headache
action	Added
peakConceptId	375527
peakName	Headache disorder
peakCode	230461009
drc	34219562

5. domainChange

This tab shows included concepts that changed their domain, so the different event table should be used.

In the example below “2108163|Therapeutic apheresis; for plasma pheresis” concept changed its domain from Procedure to Measurement, so the concept set “Treatment or investigation for TMA” needs to be used with Measurement table as well to include the “2108163|Therapeutic apheresis; for plasma pheresis” events.

cohortid	10656
conceptsetname	Treatment or investigation for TMA
conceptsetid	20
isexcluded	0
includedescendants	1
nodeConceptId	4182536
nodeConceptName	Transfusion
conceptId	2108163
conceptName	Therapeutic apheresis; for plasma pheresis
vocabularyId	CPT4
conceptCode	36514
oldDomainId	Procedure
newDomainId	Measurement
drc	1010478

Name		Name	Last commit message	Last commit date
Latest commit History 18 Commits
.github/workflows		.github/workflows
R		R
extras		extras
inst		inst
man		man
tests		tests
vignettes		vignettes
.Rbuildignore		.Rbuildignore
.gitignore		.gitignore
.lintr		.lintr
DESCRIPTION		DESCRIPTION
NAMESPACE		NAMESPACE
PhenChange.xlsx		PhenChange.xlsx
PhenotypeChangesInVocabUpdate.Rproj		PhenotypeChangesInVocabUpdate.Rproj
README.md		README.md
_pkgdown.yml		_pkgdown.yml
output_description.txt		output_description.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Utility to compare cohorts run in different vocabulary versions by resolving their concept sets

Compares source codes captured, hierarchy changes and domain changes; identifies Non-standard concepts used in concept set expressions

Step by Step Example

The output description:

Definitions/column names used:

The Excel file has the following tabs:

1. summaryTable

2. nonStNodes

3. mapDif

4.peakDif

5. domainChange

About

Releases

Packages

Languages

dimshitc/PhenotypeChangesInVocabUpdate

Folders and files

Latest commit

History

Repository files navigation

Utility to compare cohorts run in different vocabulary versions by resolving their concept sets

Compares source codes captured, hierarchy changes and domain changes; identifies Non-standard concepts used in concept set expressions

Step by Step Example

The output description:

Definitions/column names used:

The Excel file has the following tabs:

1. summaryTable

2. nonStNodes

3. mapDif

4.peakDif

5. domainChange

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages