Lung cancer has two major subtypes, NSCLC and SCLC, and the treatment pathways and outcomes of patients with these two subtypes can differ significantly. Thus, any retrospective analysis of real world data requires the ability to distinguish between these lung cancer subtypes. However, structured data from Electronic Health Records (EHR) captures diagnosis using ICD codes which has a single code for all lung cancers. Smita Agrawal, PhD and Senior Director of Product Development, explains findings from research done by a team of Concerto HealthAI data scientists on how a machine learning based method can identify NSCLC patients from a cohort of heterogeneous LC patients using de-identified retrospective electronic medical record (EMR) data. This research was a poster presentation at ISPOR 2020.