Datasets

Datasets

MIMIC

Medical Information Mart for Intensive Care III

MIMIC is a large, freely-available database comprising deidentified health-related data associated with over forty thousand patients who stayed in critical care units of the Beth Israel Deaconess Medical Center between 2001 and 2012. The database includes information such as demographics, vital sign measurements made at the bedside (~1 data point per hour), laboratory test results, procedures, medications, caregiver notes, imaging reports, and mortality (both in and out of hospital). MIMIC supports a diverse range of analytic studies spanning epidemiology, clinical decision-rule improvement, and electronic tool development.

Introduction & Documentation: physionet.org/about/mimic/
Github repository: MIT-LCP/mimic-code
Data extraction tutorial: MIT-LCP/mimic-code/blob/master/tutorials/cohort-selection.ipynb

Data Storage

OMOP Format

What is the OMOP Common Data Model (CDM)?

The OMOP Common Data Model allows for the systematic analysis of disparate observational databases. The concept behind this approach is to transform data contained within those databases into a common format (data model) as well as a common representation (terminologies, vocabularies, coding schemes), and then perform systematic analyses using a library of standard analytic routines that have been written based on the common format.

Why do we need a CDM?

Observational databases differ in both purpose and design. Electronic Medical Records (EMR) are aimed at supporting clinical practice at the point of care, while administrative claims data are built for the insurance reimbursement processes. Each has been collected for a different purpose, resulting in different logical organizations and physical formats, and the terminologies used to describe the medicinal products and clinical conditions vary from source to source.

The CDM can accommodate both administrative claims and EHR, allowing users to generate evidence from a wide variety of sources. It would also support collaborative research across data sources both within and outside the United States, in addition to being manageable for data owners and useful for data users.

Why use the OMOP CDM?

The Observational Medical Outcomes Partnership (OMOP) CDM, now in its version 5.0.1, offers a solution unlike any other. OMOP found that disparate coding systems can be harmonized—with minimal information loss—to a standardized vocabulary.

Once a database has been converted to the OMOP CDM, evidence can be generated using standardized analytics tools. We at OHDSI are currently developing Open Source tools for data quality and characterization, medical product safety surveillance, comparative effectiveness, quality of care, and patient-level predictive modeling, but there are also other sources of such tools, some of them commercial.

Platform

Web interface

R/Python/Scala, Hadoop Spark, Hadoop Presto, Hadoop HDFS

Develop your ideas and models using the AP-HP collaborative Jupyter web interface. Do not loose time with downloading the data, it is already available through this web interface.

By default, you will be able to use R, Python and/or Scala with Hadoop Spark to create your statistical models on this data.

Then, if your model has awesome results you will be able to apply it on AP-HP’s data warehouse.