Biomedical Informatics: Discovery and Impact

skip to secondary navigation


  • George Hripcsak, MD, MS
  • George Hripcsak
    Chair, Department of Biomedical Informatics
    Vivian Beaumont Allen Professor of Biomedical Informatics
    Director, Medical Informatics Services, NYP/Columbia

    Data mining, electronic health records, systems medicine

    Curriculum vitae
    Publications (Pubmed)
  • Education
  • B.S. 1981 Haverford College,
    M.D. 1985 Columbia University,
    M.S. 2000 Columbia University,
  • Research Interests
  • My research focuses on understanding and using the clinical information stored in the electronic health record. This theme has several components:

    1. Data mining and knowledge discovery. Machine learning and visualization are examples of techniques to uncover knowledge from vast clinical databases. My work focuses on testing and extending existing discovery methods to improve their performance on clinical databases. Important issues include training set size, data accuracy, data completeness, and representation (e.g., how to accommodate diagnostic data, which is nominal with many categories). Recent work includes the use of non-linear time series analysis to characterize the electronic health record.

    2. Natural language processing. In most institutions, the vast majority of the richly detailed clinical information is stored as narrative text, which is not generally amenable to automated analysis. Natural language processing can parse the narrative text, converting it to a structured and coded format.

    3. Evaluation methodology. The complexity of clinical data, the presence of inaccurate and missing values, and the large but heterogeneous collection of patients conspire to make it difficult to draw conclusions using traditional statistical methods. Bias that would not affect a traditional randomized trial can overwhelm the true effect in a retrospective study of the electronic medical record.

    5. Clinical demonstration. Demonstrating the usefulness of the above methods is critical to gather support and to focus new work in important areas. The methods can be applied to clinical research (largely hypothesis refinement) and clinical care (by generating timely advice and monitoring patient safety). Recent work has included syndromic surveillance and pharmacovigilance.

    In addition, I am studying next-generation electronic health records. Current technology supports individual clinician tasks, such as documenting and ordering, in a manner that is largely similar to that of traditional paper records. Improved understanding of workflow, information needs, cognition, and the science of collaboration can lead to improved systems that exploit human abilities, facilitate teams, and disseminate expertise.
  • Current Projects  close
    Discovering and applying knowledge in clinical databases
    Description: The long term goal is to develop and apply methods to exploit electronic medical record data to support decisions and generate knowledge. We are currently developing an information theoretic framework for characterizing the electronic health record, using the framework to study the EHR and sampling issues, and using the framework and traditional data mining to answer clinical and informatics questions.
    Sponsor: National Library of Medicine
    Dates: 2000-2013
    Next-generation electronic health records
    Description: Incorporate advanced informatics methods for human-computer interaction, collaboration, knowledge management, communication, and data representation to improve the health care encounter and health promotion.
    Dates: 1997-
    Systems medicine
    Description: Apply modeling and computational techniques to problems in health care such that it integrates knowledge and data across scales to better understand and predict processes, leading to improved health care. Current work is focused on infection control, incorporating team processes and policies, the physiology of infections, and the molecular biology of pathogens.
    Dates: 2007-
  • Grants  close
    Discovering and applying knowledge in clinical databases
    Grant Number: R01 LM06910
    Description: Develop methods for and prove the feasibility of using knowledge discovery (data mining) to generate useful, automated interpretations from a clinical repository that includes data coded through natural language processing.
    Grant Agency: National Library of Medicine
    Dates: 2000-2013
    Center for Advanced Information Management
    Grant Number: C040123
    Description: The major goals of CAT is on methods and technologies to process and analyze data, with particular emphasis on applications to electronic medical records, biomedical informatics, communications, networked systems, and the genetic basis of disease.
    Grant Agency: New York State Center for Advanced Technology
    Dates: 1994-2014
    Phenotypic pipeline for genome-wide association studies
    Grant Number:
    Grant Agency: Microsoft Research
    Dates: 2008-2009
    Training in Biomedical Informatics at Columbia University
    Grant Number: T15 LM007079
    Grant Agency: National Library of Medicine
    Dates: 1994-2012
    Biomedical informatics research
    Grant Number:
    Grant Agency: Smart Family Foundation Inc
    Dates: 2008-2009
  • Service  close
    Medical Informatics Services
    Description: Director of Medical Informatics Services for NewYork-Presbyterian Hospital/Columbia, collaborating on clinician documentation, health information exchange, and patient portals, and overseeing the clinical data warehouse, terminology, WebCIS, immunization, infection control, and physician outreach.
    Service Area: Clinical Informatics
    Service Receipient: NewYork-Presbyterian Hospital
    Dates: -
  • Courses  close
    W4501 Introduction to computer applications in health care and biomedicine (course director, 1993-1995)
    Research Elective in Medical Informatics (course director, 1995-1998)
    G4060 Evaluation Methods in Medical Informatics (course director, 1997-2004)
    G4001 Introduction to Medical Informatics (lecturer, 2001–present)
    G4003 Theory and Methods in Biomedical Informatics (lecturer, 2005–present)
  • Committees  close
    Co-chair—Meaningful Use Workgroup of the HIT Policy Committee of the Office of the National Coordinator of Health Information Technology, 2009–present

    University Senate, Columbia University, 2006–2010

    Information Technology Committee of the Board of Trustees of NewYork-Presbyterian Hospital, 2000–present

    Clinical Trials Office Strategic Advisory Committee, Columbia University, 2005–present

    Columbia Faculty Practice Organization Information Technology Committee, 2006–present

    CUMC Space Advisory Committee, Columbia University, 2007–present

    Doctoral Program Subcommittee on Biostatistics, Columbia University, 2008–present

    New York Presbyterian Hospital Medical Board, 2007–2011

    New York Presbyterian Hospital Trustee Committee on External Relations, 2008–present