Yanshan Wang with his two children.
I studied computer science at universities in China and South Korea, but it wasn’t until I came to the Mayo Clinic in Minnesota for my post-doc in 2015 that I discovered the next generation of data science was in medicine. It was also around that time that the size of data in medicine increased exponentially in the United States because it became more common for hospitals to use electronic health records. At the Mayo Clinic, I saw that the medical data was so large and underutilized that I decided to do my post-doc in the interdisciplinary field of data science and medicine. That was where I started to apply data science, natural language processing (NLP) and artificial intelligence (AI) techniques on the large volumes of data collected at the Mayo Clinic hospital.
One of my first projects at the Mayo Clinic was using big data with cancer patients and extracting their family history. All this family history data was not “structured data” because doctors traditionally write down information or dictate it and it ends up as “free text.” We developed NLP systems to extract and analyze the family history data from these cancer patient records. This way, even before a patient arrives to an appointment, the physician can run the NLP algorithm and have family history on hand to assess cancer risks. Another project was working with rheumatologists and their patients’ radiology reports, which were also usually in free text format. The physicians would need to look at patient bone fracture data, such as whether it was in their hand or ankle or a spinal fracture, and its severity and reason, such as whether the patient fell or was in a car accident. We automated the process so the physicians could look for trends in the big data of these rheumatoid arthritis patients.
A turning point in my work as a researcher was joining the faculty at the University of Pittsburgh Department of Health Information Management (HIM). I was a pure researcher prior to joining Pitt, but had realized the lack of education in health informatics and data science for the medical field and wanted to be part of the faculty in academia. In August 2021, I joined the HI program and started to teach students who are interested in HI. Most of my post-professional graduate students are health care workers–doctors, nurses, administrators–with no technical background in data science. But one of the important distinctions between Pitt HI and other schools is that we have a very strong research capacity in our program. We collaborate with UPMC and are able to provide a lot of practical data science projects pulled directly from our community in Western Pennsylvania. After taking our health informatics (HI) courses, our students are prepared to build machine learning models using real-world data and solve interesting medical questions! It is very rewarding to me.
My current research focuses on AI, NLP, machine learning, deep learning methodologies and applications in health care. I aim to leverage different dimensions of data, including electronic health records, to meet the needs of clinicians, researchers and patients. When analyzing the data, we are collaborating with physicians and providers who can use the results to create intervention plans for their patients. Because we are working with UPMC, we are directly helping the Pennsylvania community by working with their specific data. The results of our research go right back into the community.
I hope that the digital technologies I develop can enhance patient care outcomes and improve health equity, which is a big part of our research. We use the data to understand the social determinants of health: the patient’s employment status, education level, economic status, housing, physical activities, diet, lifestyle, etc. This data is in the electronic health record and we try to extract the information and figure out the factors that are impacting the health of–the disparity of–our health care systems. For example, if a patient is disabled and has no access to a modified vehicle, they cannot get to a medical facility, so how can we identify this problem so they can get the care they need.
Pittsburgh has an incredible health care and technology environment. There are only one or two other cities in the U.S. that are comparable. There are many opportunities here for people to do research in the health care domain. In my lab, the University of Pittsburgh Clinical NLP and AI Innovation Laboratory, we conduct cutting-edge research in health informatics. We are always looking for self-motivated students to join our team. They will have access to a large volume of real-world data, including electronic health records and patient-generated health data, and they will use state-of-the-art AI and NLP technologies to solve challenging health care problems with the goal of improving patient care, population health and health equity. The job market for this field is hot, and students can expect to find jobs at health care organizations, pharmaceutical companies, insurance companies, consulting companies and even technology companies like Amazon, Google and Microsoft who are getting into health care plans. With HI, you can bridge the gap between digital technologies and clinical problems to offer solutions.
-- Written by:
Yanshan Wang, PhD, FAMIA
Vice Chair of Research and Assistant Professor, Department of Health Information Management