Understanding Circe-be Logic Through Capr for Generating Complex Cohort Definitions

Author

Martin Lavallee

1 Introduction

1.1 ATLAS

Typically, we define cohort definitions for OHDSI studies using ATLAS. ATLAS has several benefits, in particular having a nice user interface to visual the cohort definition we are trying to create. However, there are times when ATLAS can be a bit tedious particularly when we must create several cohort definitions with a similar structure (template). We can deal with this situations by copying and pasting, however this can lead to errors in cohort logic and can also be quite time consuming.

1.2 Capr

Given the challenges of templating in ATLAS, the R package Capr (pronounced like the edible flower bud, caper) was created as a programatic interface to defining cohort logic for OMOP data, serving as an alternative avenue to generating cohort definitions for OHDSI studies. The advantage scripting cohort definitions is that we can define a template of our definition and iterate across multiple possibilities. Capr emphasizes the DRY principle in coding, (“Do not Repeat Yourself”) which forces programmers to define something once instead of multiple times. This sounds great, however this comes with a slight change in mindset when defining cohort definitions. To properly use Capr users need to understand the underlying logic expressed in circe-be. Capr attempts to re-populate the same json structure as one would in ATLAS, essentially a backdoor to circe-be which we have a bit more control over.

1.3 circe-be

Underneath the hood of ATLAS, there lies the circe-be software, essentially a bridge between clinical concept to computational query. When users fill out a cohort definition in ATLAS they are populating a json file. Think of the json like an “Mad-lib”, you are entering pieces into a structure that would formulate a coherent message. circe-be takes these instructions and translates them into a sql query that we can run against the OMOP data. This is a powerful tool because it is standardizing queries across the OHDSI network. In order to create this standardized query, circe-be builds elements of a sql script based on underlying components. Some of these components we are familiar with (primary criteria, inclusion rules, etc.) while others are not as well-known (query, count, group). The purpose of this demo is to use Capr to help users understand the underlying constructs of circe-be. Understanding these constructs will help improve users ability to create complex cohort definitions in ATLAS and Capr and learn the ideas towards templating in Capr.

2 Tutorial

In this tutorial we use Capr to show the circe-be structures. In particular we will demonstrate five structures: 1) Concept Set Expression (CSE), 2) Query, 3) Attribute, 4) Count and 5) Group. We provide code blocks of how to create the circe-be structure in Capr. Each code block is also accompanied by dplyr code that expresses the idea how the circe-be structure is constructed individually. The idea is to show how the Capr object would be deployed once it goes through the conversion process to standardized sql.

For our example we walk through the eMERGE phenotype for defining a Type 2 Diabetes (T2D) case. This is a complex algorithm with five potential pathways to define a T2D case, as shown in figure 1. To construct this full pathway we need to define the sub-components in the circe-be logic. We use Capr as a means of demonstrating each component of the circe-be semantic model and interfacing with these sub-components used to build cohort definitions. We build this cohort definition using the test CMS Synpuf database which includes the latest OMOP vocabulary used to define the logic. At the end of the tutorial we provide the full Capr code to build this complex cohort.

You can also watch these youtube videos to learn more about circe-be structures through ATLAS

2.1 Concept Set Expression

The first circe-be structure is the concept set expression. This is essentially a code list used to define a clinical event of interest. The expression aspect of this structure adds relational structure to the code set, incorporating descendant logic and adding exceptions to the code list for refined definition. In the eMerge algorithm, some of the paths require T2D medications in order to find a case of T2D. In the documentation we can get the list of RxNorm codes and then find the OMOP concept IDs for them. But databases record events more than just the ingredient, there can be different dosages, brands and delivery methods. However, we want to count all of these variations. This can be done using a concept set expression. In the Capr code we look up the drug IDs and then check off the includeDescendants toggle to add in all concepts that descend this hierarchy. Now when we want to look up a record of a T2D medication, we do not just look up the ingredient concept, we look at all the descendants (quite powerful 💪).

The code below is how we construct a concept set expression using Capr. The first step is to look up the concept ids in the concept table of the OMOP vocabularies and then merge this with the concept_ancestor table to find all descendants. When we run this line of code, remember that you need to establish a connection to your database to access the vocabulary tables in the defined schema.

Capr CSE Code
#define T2D medication ingredients
T2RxIds <- c(1502809L, 1502826L, 1503297L, 1510202L, 
             1515249L, 1516766L, 1525215L, 1529331L, 
             1530014L, 1547504L, 1559684L, 1560171L, 
             1580747L, 1583722L, 1594973L, 1597756L)

#create CSE in Capr
T2Rx <- getConceptIdDetails(
  conceptIds = T2RxIds,
  connectionDetails = execution_settings$connectionDetails,
  vocabularyDatabaseSchema = execution_settings$vocabulary_schema) %>%
  createConceptSetExpression(
    Name = "Type 2 Diabetes Medications",
    includeDescendants = TRUE)

To give further context as to what is going on here, we use dplyr to abstract the sql query that is taking place behind the scenes. Again we take our list of ingredient concepts and find all the descendants through the concept_ancestor table. The ohdsisql typically holds this in a temp table.

In Capr our first step in defining a cohort is to define the CSE. This construct holds all of the codes we want to look up across the different tables in CDM. To build a cohort definition we need to make sure our list of concepts is thorough.

{dplyr} CSE Representation
# example query for CSE
allT2dRx <- cdm$concept %>%
  dplyr::inner_join(
    cdm$concept_ancestor, 
    by = c("concept_id" = "descendant_concept_id")) %>%
  dplyr::filter(ancestor_concept_id %in% T2RxIds,
                is.null(invalid_reason)) %>%
  select(concept_id:invalid_reason)  %>%
  dplyr::collect()

2.2 Query

Recall the structure of the CDM (shown in the margin). We have a relational database, so to extract data from this format we need to merge tables using keys. For example, say we are looking for metformin users. We would merge the concept_id for metformin from the concept table to the drug_concept_id in the drug_exposure table and find person_id for patients that took metformin. Simply put we are performing a type of query on the relational database.

The Capr code for a query is very simple. We need to define which domain we need to look up “hits” of the concept set. Queries in Capr are defined by the create verb followed by the name of the clinical table. For example if we want a condition occurrence the Capr signature is createConditionOccurrence. The input of the query must be a Concept Set Expression object. Using Capr we are simply telling the circe-be engine that we want to look up a particular concept set in the designated domain.

Capr Query Code
# create query in Capr
T2RxQuery <- createDrugExposure(conceptSetExpression = T2Rx)

Further we can show how this declaration would be deployed in circe-be through the code below. We join the CSE we made earlier with the drug exposure table looking for persons that have a record of the code in their patient history.

{dplyr} Query Representation
# example query for a query ¯\_(ツ)_/¯
query <- cdm$drug_exposure %>%
  inner_join(allT2dRx, by = c("drug_concept_id" = "concept_id")) %>%
  select(drug_exposure_id, person_id, 
         drug_concept_id, drug_exposure_start_date, 
         concept_name, vocabulary_id) %>%
  collect() 

2.3 Attribute

Closely associated with a query is an attribute. An attribute modifies the query to subset the persons from the query that contain a particular value based on another column in the clinical table. All attributes are based on columns in the clinical domain table or from the person table. For example, in the T2D example we want measurement values where the random glucose have a value greater than 200 mg/dL (which would designate an abnormal measure). In this case we would look up all persons with a Random Glucose concept and then search the value as number column to see if the listed value is greater than 200. When constructing cohort definitions, remember that the attribute complements the query.

Using Capr an attribute object is first defined outside the query and then placed in a list within the query command. In the code block below we create an object called value200 which holds an attribute to modify the query. This attribute is called an OpAttribute. We are deploying an mathematical operator or inequality to describe the logic of interest in our query. Other attributes include a ConceptAttribute and LogicAttribute.

Capr Attribute Code
#create Random glucose CSE in Capr
AbLabRandomGluc <- getConceptCodeDetails(
  conceptCode = c("2339-0", "2345-7"),
  vocabulary = "LOINC",
  connectionDetails = execution_settings$connectionDetails,
  vocabularyDatabaseSchema = execution_settings$vocabulary_schema,
  mapToStandard = TRUE) %>%
  createConceptSetExpression(Name = "Abnormal Lab Random Glucose",
                             includeDescendants = TRUE)

# create an attribute of >= 200 mg/dl
value200 <- createValueAsNumberAttribute(Op = "gt", Value = 200L)

#Create Random glucose Query with value attribute
AbLabRandomGlucQuery <- createMeasurement(
  conceptSetExpression = AbLabRandomGluc,
  attributeList = list(value200)
)

The example shown for Capr is a tad tricky to show in synpuf because there are limited lab values so we show an example using a gender attribute. Again the attribute is modifier of the query, where we are filter the matching persons by the existence of another value. In the case of gender we have a concept ID for female (8532) so to find females who have taken a T2D medications, we first do a filter join to find the persons with a “hit” from the CSE and then we join on the person table by the person_id. From this set of persons we filter to only count those with a concept id of 8532 in the gender_concept_id column of the person table. As you can see the attribute is additional filtering logic that modifies the query.

{dplyr} Attribute Representation
# example query for an attribute
attribute <- cdm$drug_exposure %>%
  inner_join(allT2dRx, by = c("drug_concept_id" = "concept_id")) %>%
  inner_join(cdm$person, by = c("person_id")) %>%
  filter(gender_concept_id == 8532L) %>%
  select(drug_exposure_id, person_id, gender_concept_id,
         drug_concept_id, drug_exposure_start_date, 
         concept_name) %>%
  collect()

2.4 Count

So far we have not incorporated time into our queries only the existence of a code in a table. However, timing is vital when determining a cohort of patients. We need to ensure that of the initial set of patients, we restrict people who have experienced a medical event at some plausible point in their patient history. For example if we want persons with T2D, we want to ensure they do not have prior type 1 diabetes. This is the essence of the circe-be count structure; we enumerate patients based on the temporal occurrence of a medical event. Counts are typically only defined in Additional Criteria and Inclusion Rules because we need the occurrence of a prior event in order to define a window in the patient history on which to enumerate.

In Capr counts require: 1) a query, 2) a count and 3) a timeline. The timeline sets the window of observation relative to another event, in circe-be this is typically relative to the primary criteria (unless we are building a correlated criteria attribute). In the exmaple below we define two counts: 1) at least 1 occurrence of an T2D medication and 2) no occurrence of a T2D medications. Note these are for different pathways in the T2D eMerge algorithm. Relative to the primary criteria we define our window as all time before and no time after. Now we can begin to enumerate 🧮! We want to observe the occurrence of a query x instances where x is some value that we apply with an inequality. If we want at least 1 instance we follow the first example in the Capr code and if we want no occurrence we follow the second example in the Capr code.

So if we want to create an inclusion rule we need to understand the primary criteria before we build any rule. Next we want to define the time relative to this index event where to create a window. Then we create a query of a medical event we want to observe in this window. Finally we want to define how many times we observe this event in the patient history in order for the subject to be included or excluded.

Capr Count Code
#create timeline
tl1 <- createTimeline(
  StartWindow = createWindow(
    StartDays = "All", 
    StartCoeff = "Before", 
    EndDays = 0L, 
    EndCoeff = "After")
)


#at least 1 T2DM medication 
atLeast1T2RxCount <- createCount(
  Query = T2RxQuery, 
  Logic = "at_least",
  Count = 1L,
  Timeline = tl1)

#no exposure to T2DM medication 
noT2RxCount <- createCount(
  Query = T2RxQuery, 
  Logic = "exactly",
  Count = 0L,
  Timeline = tl1)

An example of how circe-be deploys a count construct can be seen in the naive example below. We want to include people into the cohort if they have experienced an exposure to T2D medication between 365 to 1 day prior to a T2D diagnosis. As we can see the idea of a count is to enumerate an event temporally based on some prior event. Counts are usually used within additional criteria and inclusion rules where the temporal bounds are set by the primary criteria.

{dplyr} Count Representation
count1 <- cdm$condition_occurrence %>%
  dplyr::filter(condition_concept_id == 201826L) %>%
  dplyr::group_by(person_id) %>%
  dplyr::mutate(rn = min_rank(condition_start_date)) %>%
  dplyr::ungroup() %>%
  dplyr::filter(rn == 1) %>%
  dplyr::select(condition_occurrence_id:condition_start_date) %>%
  dplyr::inner_join(
    cdm$drug_exposure %>%
      dplyr::select(drug_exposure_id:drug_exposure_start_date) %>%
      dplyr::inner_join(allT2dRx, 
                        by = c("drug_concept_id" = "concept_id")),
    by = c("person_id")
  ) %>%
  dplyr::mutate(
    hit = dplyr::if_else(dplyr::between(
      drug_exposure_start_date,
      condition_start_date - lubridate::days(365),
      condition_start_date - lubridate::days(1)), 
      1L, 0L, 0L)
  ) %>%
  dplyr::filter(hit == 1) %>%
  dplyr::group_by(person_id) %>%
  dplyr::mutate(rn = min_rank(drug_exposure_start_date)) %>%
  dplyr::ungroup() %>%
  dplyr::filter(rn == 1) %>%
  dplyr::select(person_id:condition_start_date, 
                drug_exposure_start_date,
                drug_concept_id, concept_name) %>%
  dplyr::distinct() %>%
  dplyr::collect()

2.5 Group

Groups are the most complex, yet most powerful structure in the underlying circe-be semantic model. A group bundles all counts and groups together into a single piece of logic that determines whether a person is added or omitted from a cohort. The eMerge T2D algorithm offers excellent examples of a group. The first path towards a T2D case is no occurrence of T2D diagnosis, at least 1 T2D medication and at least 1 abnormal lab measurement. The patient needs to pass all three of these rules in order to be added or omitted from the cohort. Interestingly in this example we are using two counts and a group. Group objects in circe-be can hold other groups 🤯! The T2D algorithm defines abnormal labs as one of any: 1) random glucose \(> 200mg/dL\), 2) HbA1c of \(\geq 6.5\%\) and 3) fasting glucose \(\geq 125 mg/dL\). After defining this group we then need to bundle the count substructures for no occurrence of T2D diagnosis and at least 1 T2D medication. The Capr code below shows how to build this structure from start to finish.

Capr Group Code
#AbLab Counts
#at least 1 abnormal HbA1c Lab
atLeast1AbLabHbA1cCount <- createCount(Query = AbLabHbA1cQuery, 
                                       Logic = "at_least",
                                       Count = 1L,
                                       Timeline = tl1)

#at least 1 abnormal Fasting Glucose Lab
atLeast1AbLabFastingGlucCount <- createCount(Query = AbLabFastingGlucQuery, 
                                             Logic = "at_least",
                                             Count = 1L,
                                             Timeline = tl1)

#at least 1 abnormal Random Glucose Lab
atLeast1AbLabRandomGlucCount <- createCount(Query = AbLabRandomGlucQuery, 
                                            Logic = "at_least",
                                            Count = 1L,
                                            Timeline = tl1)
#ab lab group
atLeast1AbLabGroup <- createGroup(
  Name = "Abnormal labs for HbA1c, Fasting+Random Glucose",
  type = "ANY",
  criteriaList = list(
    atLeast1AbLabHbA1cCount,
    atLeast1AbLabFastingGlucCount,
    atLeast1AbLabRandomGlucCount)
)


#no occurrence of T2 Diabetes
noT2DxCount <- createCount(Query = T2DxQuery, 
                           Logic = "exactly",
                           Count = 0L,
                           Timeline = tl1)

#at least 1 T2DM medication 
atLeast1T2RxCount <- createCount(Query = T2RxQuery, 
                                 Logic = "at_least",
                                 Count = 1L,
                                 Timeline = tl1)

# Path 1: 0 T2Dx, 1+ T2Rx and 1+ AbLab
Pathway1T2DMGroup <- createGroup(
  Name = "Pathway1",
  Description = "0 T2Dx, 1+ T2Rx and 1+ AbLab",
  type = "ALL",
  criteriaList = list(noT2DxCount, atLeast1T2RxCount),
  Groups = list(atLeast1AbLabGroup))

Again the group example is hard to depict in synpuf data so we simplify it to provide a {dplyr} representation. We could have two count objects persons who take T2D medications and those who take ace inhibitors. Of people who are diagnosed with T2D we want to see if they have taken both of these medications to be in the cohort. A group allows us to combine the logic of both of these counts using a join statement as shown below.

{dplyr} Group Representation
# ace inhibitors drugs

aceIds <- c(1308216L, 1310756L, 
            1331235L, 1334456L,
            1335471L, 1340128L, 
            1341927L, 1342439L,
            1363749L, 1373225L)

aceInhib <- cdm$concept %>%
  dplyr::inner_join(
    cdm$concept_ancestor, 
    by = c("concept_id" = "descendant_concept_id")) %>%
  dplyr::filter(ancestor_concept_id %in% aceIds,
                is.null(invalid_reason)) %>%
  select(concept_id:invalid_reason) 

# Second Count: an exposure to T2D Rx before Afib Dx

count2 <- cdm$condition_occurrence %>%
  dplyr::filter(condition_concept_id == 201826L) %>%
  dplyr::group_by(person_id) %>%
  dplyr::mutate(rn = min_rank(condition_start_date)) %>%
  dplyr::ungroup() %>%
  dplyr::filter(rn == 1) %>%
  dplyr::select(condition_occurrence_id:condition_start_date) %>%
  dplyr::inner_join(
    cdm$drug_exposure %>%
      dplyr::select(drug_exposure_id:drug_exposure_start_date) %>%
      dplyr::inner_join(aceInhib, by = c("drug_concept_id" = "concept_id")),
    by = c("person_id")
  ) %>%
  dplyr::mutate(
    hit = dplyr::if_else(dplyr::between(
      drug_exposure_start_date,
      condition_start_date - lubridate::days(365),
      condition_start_date - lubridate::days(1)), 
      1L, 0L, 0L)
  ) %>%
  dplyr::filter(hit == 1) %>%
  dplyr::group_by(person_id) %>%
  dplyr::mutate(rn = min_rank(drug_exposure_start_date)) %>%
  dplyr::ungroup() %>%
  dplyr::filter(rn == 1) %>%
  dplyr::select(person_id:condition_start_date, 
                drug_exposure_start_date,
                drug_concept_id, concept_name) %>%
  dplyr::distinct()

# formulation of a group
group <- count1 %>%
  inner_join(count2, by = "person_id") %>%
  dplyr::collect()

3 eMerge T2D

In the tutorial above we defines the 5 essential circe-be substructures that are needed to build elements of a cohort definition. Capr defines cohort definitions from the bottom up so we need to understand these sub-structures to effectively create complex cohort definitions. Our understanding of these sub-structures improves the way we can build cohorts and create templates in Capr. The following is how one would create the full T2D algorithm from eMerge. While this is a long code block, this shows how the fundamental pieces can be created and deployed into different iterations to formulate complex algorithms.

library(Capr)
library(DatabaseConnector)
library(CohortGenerator)


#lookup concepts for T2DM cohort -------------------

#Type 2 Diabetes Diagnosis
T2Dx <- getConceptIdDetails(
  conceptIds = 201826,
  connectionDetails = connectionDetails,
  vocabularyDatabaseSchema = vocabularyDatabaseSchema) %>%
  createConceptSetExpression(
    Name = "Type 2 Diabetes Diagnosis",
    includeDescendants = TRUE)

#Type 2 Diabetes Medications
T2RxIds <- c(1502809L, 1502826L, 1503297L, 1510202L, 
             1515249L, 1516766L, 1525215L, 1529331L, 
             1530014L, 1547504L, 1559684L, 1560171L, 
             1580747L, 1583722L, 1594973L, 1597756L)

T2Rx <- getConceptIdDetails(
  conceptIds = T2RxIds,
  connectionDetails = connectionDetails,
  vocabularyDatabaseSchema = vocabularyDatabaseSchema) %>%
  createConceptSetExpression(
    Name = "Type 2 Diabetes Medications",
    includeDescendants = TRUE)

#Type 1 Diabetes Diagnosis
T1Dx <- getConceptIdDetails(
  conceptIds = 201254,
  connectionDetails = connectionDetails,
  vocabularyDatabaseSchema = vocabularyDatabaseSchema) %>%
  createConceptSetExpression(
    Name = "Type 1 Diabetes Diagnosis",
    includeDescendants = TRUE)

#Type 1 Diabetes Medications
T1DRxNormCodes <- paste(c(139825,274783,314684,
                          352385,400008,51428,
                          5856,86009,139953))
T1Rx <- getConceptCodeDetails(
  conceptCode = T1DRxNormCodes,
  vocabulary = "RxNorm",
  connectionDetails = connectionDetails,
  vocabularyDatabaseSchema = vocabularyDatabaseSchema,
  mapToStandard = TRUE) %>%
  createConceptSetExpression(
    Name = "Type 1 Diabetes Medications",
    includeDescendants = TRUE)

#Abnormal Lab
AbLabHbA1c <- c("4548-4", "17856-6", "4549-2", "17855-8") %>%
  getConceptCodeDetails(conceptCode = .,
                        vocabulary = "LOINC",
                        connectionDetails = connectionDetails,
                        vocabularyDatabaseSchema = vocabularyDatabaseSchema,
                        mapToStandard = TRUE) %>%
  createConceptSetExpression(Name = "Abnormal Lab HbA1c",
                             includeDescendants = TRUE)

#Ab Lab for Random Glucose (>= 200 mg/dl)
AbLabRandomGluc <- c("2339-0", "2345-7") %>% 
  getConceptCodeDetails(conceptCode = .,
                        vocabulary = "LOINC",
                        connectionDetails = connectionDetails,
                        vocabularyDatabaseSchema = vocabularyDatabaseSchema,
                        mapToStandard = TRUE) %>%
  createConceptSetExpression(Name = "Abnormal Lab Random Glucose",
                             includeDescendants = TRUE)

#Ab Lab for Fasting Glucose (>= 125 mg/dl)

AbLabFastingGluc <- c("1558-6") %>% 
  getConceptCodeDetails(conceptCode = .,
                        vocabulary = "LOINC",
                        connectionDetails = connectionDetails,
                        vocabularyDatabaseSchema = vocabularyDatabaseSchema,
                        mapToStandard = TRUE) %>%
  createConceptSetExpression(Name = "Abnormal Lab Fasting Glucose",
                             includeDescendants = TRUE)

## Set up Queries -----------------------

#########################
#T2Rx Drug Exposure Query 
#########################
T2RxQuery <- createDrugExposure(conceptSetExpression = T2Rx)

#########################
#T1Rx Drug Exposure Query
#########################
T1RxQuery <- createDrugExposure(conceptSetExpression = T1Rx)


################################
#T2Dx Condition Occurrence Query
################################
T2DxQuery <- createConditionOccurrence(conceptSetExpression = T2Dx)

################################
#T1Dx Condition Occurrence Query
#################################
T1DxQuery <- createConditionOccurrence(conceptSetExpression = T1Dx)

########################
#Abnormal Lab Query
############################

#HbA1c Query with value attribute
AbLabHbA1cQuery <- createMeasurement(conceptSetExpression = AbLabHbA1c,
                                     attributeList = list(
                                       #add attribute of >= 6 %
                                       createValueAsNumberAttribute(
                                         Op = "gte",
                                         Value = 6.5)
                                     ))
#RandomGluc Query with value attribute
AbLabRandomGlucQuery <- createMeasurement(conceptSetExpression = AbLabRandomGluc,
                                          attributeList = list(
                                            #add attribute of >= 200 mg/dl
                                            createValueAsNumberAttribute(
                                              Op = "gt",
                                              Value = 200L)
                                          ))
#FastingGluc Query with value attribute
AbLabFastingGlucQuery <- createMeasurement(conceptSetExpression = AbLabFastingGluc,
                                           attributeList = list(
                                             #add attribute of >= 125 mg/dl
                                             createValueAsNumberAttribute(
                                               Op = "gte",
                                               Value = 125L)
                                           ))

## Create Counts -----------------

#create timeline
tl1 <- createTimeline(StartWindow = createWindow(
  StartDays = "All", StartCoeff = "Before", 
  EndDays = 0L, EndCoeff = "After"))

#################
#Diagnosis Counts
#################

#no occurrence of T1 Diabetes
noT1DxCount <- createCount(Query = T1DxQuery, 
                           Logic = "exactly",
                           Count = 0L,
                           Timeline = tl1)

#no occurrence of T2 Diabetes
noT2DxCount <- createCount(Query = T2DxQuery, 
                           Logic = "exactly",
                           Count = 0L,
                           Timeline = tl1)


#at least 1 occurrence of T2 Diabetes
atLeast1T2DxCount <- createCount(Query = T2DxQuery, 
                                 Logic = "at_least",
                                 Count = 1L,
                                 Timeline = tl1)

#at least 2 occurrence of T2 Diabetes
atLeast2T2DxCount <- createCount(Query = T2DxQuery, 
                                 Logic = "at_least",
                                 Count = 2L,
                                 Timeline = tl1)

##################
#Medication Counts
##################


#at least 1 T2DM medication 
atLeast1T2RxCount <- createCount(Query = T2RxQuery, 
                                 Logic = "at_least",
                                 Count = 1L,
                                 Timeline = tl1)

#no exposure to T2DM medication 
noT2RxCount <- createCount(Query = T2RxQuery, 
                           Logic = "exactly",
                           Count = 0L,
                           Timeline = tl1)

#at least 1 T1DM medication 
atLeast1T1RxCount <- createCount(Query = T1RxQuery, 
                                 Logic = "at_least",
                                 Count = 1L,
                                 Timeline = tl1)

#no exposure to T1DM medication 
noT1RxCount <- createCount(Query = T1RxQuery, 
                           Logic = "exactly",
                           Count = 0L,
                           Timeline = tl1)

#################
#AbLab Counts
#################

#at least 1 abnormal HbA1c Lab
atLeast1AbLabHbA1cCount <- createCount(Query = AbLabHbA1cQuery, 
                                       Logic = "at_least",
                                       Count = 1L,
                                       Timeline = tl1)

#at least 1 abnormal Fasting Glucose Lab
atLeast1AbLabFastingGlucCount <- createCount(Query = AbLabFastingGlucQuery, 
                                             Logic = "at_least",
                                             Count = 1L,
                                             Timeline = tl1)

#at least 1 abnormal Random Glucose Lab
atLeast1AbLabRandomGlucCount <- createCount(Query = AbLabRandomGlucQuery, 
                                            Logic = "at_least",
                                            Count = 1L,
                                            Timeline = tl1)


## Create Groups ----------------------------

#1) No T1Dx at any point in patient history

NoT1DxGroup <- createGroup(Name = "No Diagnosis of Type 1 Diabetes",
                           type = "ALL",
                           criteriaList = list(noT1DxCount))

#2) AbLab Group (>=6% HbA1c, >= 125mg/dl Fasting Glucose, 
#>= 200 mg/dl Random Glucose)

atLeast1AbLabGroup <- createGroup(
  Name = "Abnormal labs for HbA1c, Fasting+Random Glucose",
  type = "ANY",
  criteriaList = list(
    atLeast1AbLabHbA1cCount,
    atLeast1AbLabFastingGlucCount,
    atLeast1AbLabRandomGlucCount)
)

#3) Nested Criteria T2Rx precedes T1Rx
tl2 <- createTimeline(StartWindow = createWindow(
  StartDays = "All", StartCoeff = "Before", 
  EndDays = 1L, EndCoeff = "Before"))

PriorT2RxCount <- createCount(
  Query = T2RxQuery,
  Logic = "at_least",
  Count = 1L,
  Timeline = tl2
)

PriorT2RxNestedGroup <- createCorrelatedCriteriaAttribute(
  createGroup(
    Name = "Nested Group T2Rx before T1Rx",
    type = "ALL",
    criteriaList = list(PriorT2RxCount)
  )
)

T2RxBeforeT1RxCount <- createDrugExposure(
  conceptSetExpression = T1Rx,
  attributeList = list(PriorT2RxNestedGroup)) %>%
  createCount(Logic = "at_least", Count = 1L,
              Timeline = tl1)


#4) Path 1: 0 T2Dx, 1+ T2Rx and 1+ AbLab
Pathway1T2DMGroup <- createGroup(
  Name = "Pathway1",
  Description = "0 T2Dx, 1+ T2Rx and 1+ AbLab",
  type = "ALL",
  criteriaList = list(noT2DxCount, atLeast1T2RxCount),
  Groups = list(atLeast1AbLabGroup))

#5) Path 2: 1+ T2Dx, 0 T1Rx, 0 T2Rx, and 1+ AbLab  
Pathway2T2DMGroup <- createGroup(
  Name = "Pathway2",
  Description = "1+ T2Dx, 0 T1Rx, 0 T2Rx, and 1+ AbLab",
  type = "ALL",
  criteriaList = list(atLeast1T2DxCount, noT1RxCount, noT2RxCount),
  Groups = list(atLeast1AbLabGroup))

#6) Path 3: 1+ T2Dx, 0 T1Rx, and 1+ T2Rx  

Pathway3T2DMGroup <- createGroup(
  Name = "Pathway3",
  Description = "1+ T2Dx, 0 T1Rx, and 1+ T2Rx",
  type = "ALL",
  criteriaList = list(atLeast1T2DxCount, noT1RxCount, atLeast1T2RxCount)
)
#7) Path 4: 1+ T2Dx, 1+ T1Rx, 1+T2Rx, and 1+ T2Rx < T1Rx    
Pathway4T2DMGroup <- createGroup(
  Name = "Pathway4",
  Description = "1+ T2Dx, 1+ T1Rx, 1+T2Rx, and 1+ T2Rx < T1Rx",
  type = "ALL",
  criteriaList = list(atLeast1T2DxCount, atLeast1T1RxCount,
                      T2RxBeforeT1RxCount)
)
#8) Path 5: 1+ T2Dx, 1+ T1Rx, 0 T2Rx and 2+ T2Dx   
Pathway5T2DMGroup <- createGroup(
  Name = "Pathway5",
  Description = "1+ T2Dx, 1+ T1Rx, 0 T2Rx and 2+ T2Dx",
  type = "ALL",
  criteriaList = list(atLeast1T2DxCount, atLeast1T1RxCount,
                      noT2RxCount, atLeast2T2DxCount)
)

#9) T2DM Case Group

T2DMCase <- createGroup(
  Name = "Case for T2DM using algorithm",
  type = "ANY",
  Groups = list(Pathway1T2DMGroup, Pathway2T2DMGroup, 
                Pathway3T2DMGroup, Pathway4T2DMGroup, 
                Pathway5T2DMGroup)
)


## Create Cohort Definition ----------------------------

#create Primary criteria that initial captures persons
#they have a T2DM diagnosis, a T2Rx, and an abnormal lab
PrimaryCriteria <- createPrimaryCriteria(
  Name = "PC for T2DM Case Phenotype",
  ComponentList = list(T2DxQuery,T2RxQuery,AbLabHbA1cQuery,
                       AbLabFastingGlucQuery,AbLabRandomGlucQuery),
  ObservationWindow = createObservationWindow(),
  Limit = "All")

#create additional Criteria
#further restrict the initial capture to people with no T1Dx
AdditionalCriteria <- createAdditionalCriteria(
  Name = "AC for T2DM Case Phenotype",
  Contents = NoT1DxGroup,
  Limit = "First"
)

#create Inclusion Rules
#keep T2DM cases if they meet 1 of the 5 pathways

T2DMCase <- createGroup(
  Name = "Case for T2DM using algorithm",
  type = "ANY",
  Groups = list(Pathway1T2DMGroup, Pathway2T2DMGroup, 
                Pathway3T2DMGroup, Pathway4T2DMGroup, 
                Pathway5T2DMGroup)
)

InclusionRules <- createInclusionRules(
  Name = "IRs for T2DM Case Phenotype",
  Contents = list(T2DMCase),
  Limit = "First"
)

Capr::saveComponent(InclusionRules, 
                    saveName = "phekbT2dCase", 
                    savePath = "cohorts/components")

#create Censoring Criteria
#person exits cohort if there is a diagnosis of T1DM
CensoringCriteria <- createCensoringCriteria(
  Name = "Censor of T1DM cases",
  ComponentList = list(T1DxQuery)
)

#Create Cohort Definition
T2DMPhenotype <- createCohortDefinition(
  Name = "PheKB T2DM Definition",
  PrimaryCriteria = PrimaryCriteria,
  AdditionalCriteria = AdditionalCriteria,
  InclusionRules = InclusionRules,
  CensoringCriteria = CensoringCriteria
)

### compile circe
T2DMPhenotypeJson <- compileCohortDefinition(T2DMPhenotype)

#save inclusion rules
Capr::saveComponent(InclusionRules, 
                    saveName = "phekbT2dCase", 
                    savePath = "cohorts/components")
Capr::saveComponent(AdditionalCriteria, saveName = "noT1D_AC", savePath = "cohorts/components")

## Additional manipulations ---------------

#import json
sglt2Cohort <- Capr::readInCirce(jsonPath = "cohorts/json/sglt2.json",
                                 connectionDetails = connectionDetails,
                                 vocabularyDatabaseSchema = vocabularyDatabaseSchema)


#lookup drug 
glp1 <- Capr::getConceptIdDetails(
  conceptIds = c(793143, 40170911, 43013171, 44816332, 45774435, 1583722),
  connectionDetails = connectionDetails,
  vocabularyDatabaseSchema = vocabularyDatabaseSchema) 
# Turn into CSE 
glp1CSE <- Capr::createConceptSetExpression(
  conceptSet = glp1,
  Name = "GLP1",
  includeDescendants = TRUE
)
#Create Drug Exposure Query

glp1Query <- Capr::createDrugExposure(
  conceptSetExpression = glp1CSE,
  attributeList = list(
    Capr::createAgeAttribute(Op = "gte", Value = 18),
    Capr::createFirstAttribute(),
    Capr::createOccurrenceStartDateAttribute(Op = "gt",
                                             Value = "2012-01-01")
  ))
#Create Primary Criteria 
ow <- Capr::createObservationWindow(PriorDays = 365, PostDays = 0)
pc <- Capr::createPrimaryCriteria(Name = "GLP1 Exposure",
                                  ComponentList = list(glp1Query),
                                  ObservationWindow = ow,
                                  Limit = "All")


glp1Cohort <- sglt2Cohort
glp1Cohort@PrimaryCriteria <- pc