User Tools

Site Tools


documentation:next_cdm:add_person_to_cost

Cost Table Changes (Add Person_id, Dates and normalize)

Proposals are now being tracked as github issues

link to github issue

Proposal Owner: Gowtham Rao, Chris Knoll, Klaus Bonadt

Discussion: http://forums.ohdsi.org/t/proposal-for-cost-table-adjustment/1842/12

Cost table description: http://www.ohdsi.org/web/wiki/doku.php?id=documentation:cdm:cost

Proposal overview:

  1. Add person_id
  2. Add billed_datetime and paid_datetime
  3. Change Cost table structure - to “normalized” version

1. Add Person_id to Cost table

  • Design justification: We have a design principle that all domain related data tables are person-centric where for each record the person_id and date are captured at a minimum. We deviate from this design principle in the COST table. Addition of person_id will make the SQL statement in database systems faster. A missing person_id prevents other lightweight non-database systems to process all transaction tables chunk wise. In certain database environments (specifically Massively Parallel Processing (MPP) that we see in redshift and MS PDW (and possibly HiveDB on Hadoop)) we want to be able to hash on a field of the table. By NOT having a person id in the cost table, we can't hash the data such that the data for a given person can be co-located on the same cluster. By HAVING this person_id field, we can define the table as hashed on person_id.
  • Discussion for and against: However, adding the person id and date will further de-normalize the OMOP CDM Schema. Local deployments of OMOP may add person_id on their own. This was countered by: but that will fail the standardization principle. The query optimizer may not leverage a hashing column if the column isn't applied in the query - standard OHDSI applications will not use person_id if it is not a standard OMOP CDM standard. While trying to calculate an average cost of something by person from the cost table, we have to go to the drug_exposure table, join back to cost (on the domain=“drug”) and the drug_exposure_id (neither of these are hashed). This will cause a sort of 'shuffle' move of data from the cost table (which can only be hashed on a cost_id) UNLESS we say where cost.person_id = drug_exposure.person_id. Cost-table is already denormalized with many cost entries (allowed amount, paid by payer, paid by patient etc). Alternative would be to redesign the cost table to cost_type and one cost_value instead of many COST entries. Addition of person_id may not impact the already denormalized table, but would rather enable us to process huge datasets without the need of spending cost expensive hardware and software resources for building up complex queries thru multiple inefficient joins.
  • Data integrity checks: To avoid the possibility of database integrity conflicts, e.g. the person_id passed through the event table might be different from the one in the COST table. We need to add it to the constraints (slow), a warning in the description or data quality tools like ACHILLES HEELS, which should watch that kind of thing.

2. Add incurred_date, incurred_datetime, billed_datetime, paid_datetime:

  • Add three new datetime fields in the cost table. billed_datetime and paid_datetime
  • Incurred_date or service_date (match event table) - This is in-line with the design principle.
  • Both fields are optional (Required = No), they are date time fields (Type = DateTime - following OMOP conventions)
  • Costs are associated with a visit_occurrence, procedure_occurrence, drug_occurrence, observation, device etc.
  • There are generally three types of dates associated with costs. Incurred date, Billed Date and Paid date.
  • Incurred date is the date of the service. They are captured in the respective visit, procedure etc.
  • Billed date and paid date are not captured in OMOP CDM. Use cases for these are listed below

Use cases:

  • Health economics and actuarial analysis use incurred date, billed date and paid date in the formulas
  • Incurred But not reported: https://en.wikipedia.org/wiki/Incurred_but_not_reported amount owed by an insurer to all valid claimants who have had a covered loss but have not yet reported it. Very important for claim reserves estimation (represent the money which should be held by the insurer so as to be able to meet all future claims arising from policies currently in force and policies written in the past.)
  • Completion factor trend analysis

Analytic questions:

  • Trend analysis for operational efficiency - claims adjudication rate, completion factor, IBNR, claims reserve estimation

Importance:

  • New use cases around financial and actuarial departments of organizations
  • This will expand the OHDSI/OMOP footprint

Consequence of doing it:

  • Adoption of Cost table may increase.
  • New use cases that serve the needs of financial entities in an organization will expand the adoption of the OMOP CDM.

Consequences of not doing it:

  • Query optimization is difficult.
  • Analyst has to write complex queries. Development of standardized tools may be delayed

3. Leverage cost_type_concept_id and remove cost type columns

  • Proposed structure of new table is below
  • The key idea here is that instead of making each cost-type a column (i.e. wide representation), lets convert to long representation. We will leverage cost_type_concept_id for this.
  • cost_type_concept_id will be used to identify standard cost types: total_charge, total_cost, total_paid, paid_by_payer, paid_by_patient, paid_patient_coinsurance, paid_patient_deductible, paid_by_primary, paid_ingredient_cost, paid_dispensing_fee, payer_plan_period_id, amount_allowed. This allows to generalize the cost table and we represent arbitrary number of cost types. We can also represent international and custom use cases.
  • Should cost_domain_id be an integer or string? Currently it is a string.
Field Required Type Description
cost_id Yes integer A unique identifier for each COST record.
person_id Yes integer A unique identifier for each person.
cost_event_id Yes integer A foreign key identifier to the event (e.g. Measurement, Procedure, Visit, Drug Exposure, etc) record for which cost data are recorded.
cost_domain_id Yes integer The concept id representing the domain of the cost event, from which the corresponding table can be inferred that contains the entity for which cost information is recorded.
cost_type_concept_id Yes integer A foreign key identifier to a concept in the CONCEPT table for the provenance or the source of the COST data: Calculated from insurance claim information, provider revenue, calculated from cost-to-charge ratio, reported from accounting database, etc.
currency_concept_id Yes integer A foreign key identifier to the concept representing the 3-letter code used to delineate international currencies, such as USD for US Dollar.
total_cost Yes float The actual financial cost amount
incurred_date Yes date The first date of service of the clinical event as in table capturing the information (e.g. date of visit, date of procedure, date of condition, date of drug etc).
incurred_datetime No datetime
billed_datetime No datetime The date and time a bill was generated for a service or encounter
paid_datetime No datetime The date and time payment was received for a service or encounter
revenue_code_concept_id No integer A foreign key referring to a Standard Concept ID in the Standardized Vocabularies for Revenue codes.
drg_concept_id No integer A foreign key referring to a Standard Concept ID in the Standardized Vocabularies for DRG codes.
revenue_code_source_value No string(50) The source code for the Revenue code as it appears in the source data, stored here for reference.
drg_source_value No string(50) The source code for the 3-digit DRG source code as it appears in the source data, stored here for reference.
documentation/next_cdm/add_person_to_cost.txt · Last modified: 2017/07/10 16:43 by clairblacketer