Data Extraction Overview

Clinical researchers often need data from Johns Hopkins data systems in order to conduct their study. The ICTR Informatics Core (i2c) is pleased to provide multiple services to assist researchers with their IRB-approved data needs.

Most often, researchers are seeking data from the Epic medical record system. Epic is used at the Johns Hopkins Hospital, Johns Hopkins Bayview Medical Center, Howard County General, Sibley Memorial, Suburban Hospital, and Johns Hopkins Community Physicians. In addition to manual chart abstraction, researchers have a few different options for getting data from Epic:

  • Use existing Epic reports, with the approval of the Epic Research Request Review Committee. Note that downloading data from Epic for these reports requires special permission.
  • Use Epic SlicerDicer to get a patient count for a cohort, or to get identifiable data for patients for whom you have personally provided care.
  • Use TriNetX to not only find patient counts for your cohort but also run analysis tools to examine demographics, labs, meds, and diagnoses for this cohort, and examine the effect that your inclusion and exclusion criteria are having on the narrowing of your cohort.  TriNetX is expected to be available in early 2019.
  • Ask a CCDA or BEAD Core data analyst to write a database query to extract data for your study. This is often needed when the other options are not feasible due to the complexity or size of the data query.

Johns Hopkins began deploying Epic in the Spring of 2013 and completed the deployment of Epic in mid 2016. While some historical data were loaded into Epic (most notably lab data and text documents), for the most part data collected before 2013 will need to be pulled from another system. The available legacy systems are: EPR (JHH), Sunrise POE (JHH inpatient) and Meditech (Bayview). In addition, CaseMix is often used for diagnosis and procedure data. To learn what data are available for your study, please feel free to contact the CCDA for a free 2hr consultation underwritten by the ICTR.

Patient data must always be stored securely. The best and easiest option is to store data on the SAFE desktop. Storage of patient data on your laptop or workstation must be avoided. The following document lists the various data storage options and their associated risk:

http://intranet.insidehopkinsmedicine.org/data_trust/_docs/risk-matrix-for-health-data-and-analysis.pdf

Patient data, especially data collected prior to the Epic go-live in 2013, are often only available in text documents. In some cases the volume of text data makes it unfeasible to gather via human review. In such cases, a programming technique known as Natural Language Processing (NLP) can be used to extract data from the text documents. For more information, visit this page.

Another important data extraction consideration is whether the data need to be deidentified and if so, what constitutes full de-identification. While de-identified data reduce the risk of data breach, it is often not feasible for the study to conduct their research with fully de-identified data. For example, many researchers do not realize that dates are identifiable data. To learn more about data deidentification and limited data sets, visit this website.

Should you need to share data with an entity outside of Johns Hopkins, please consult the following document:

http://intranet.insidehopkinsmedicine.org/data_trust/_docs/research-collaboration-guidance.pdf