How is diabetes screened in patients who don't have the usual risk factors? Until now it hasn't been, unless there are symptoms prompting a physician to test the patient. Type 2 diabetes affects nearly 10 percent of the population, though, and a quarter of those people don't know they have it, according to Ariana Anderson, PhD, assistant research professor and statistician for the Department of Psychiatry and Biobehavioral Sciences in the David Geffen School of Medicine at UCLA.
Dr. Anderson and her colleagues used "big data" and machine learning to develop a diabetes screening algorithm by identifying people who suffer from novel risk factors beyond the usual suspects like age and body mass index (BMI). They published their findings in the Journal of Biomedical Informatics on February 16.
Filling in the blanks
"I wanted to try to predict diabetes using the information already available in the health record," she says. A lot of work has already been done with electronic health records (EHRs) in efforts to predict the existence of certain diseases. A major flaw in past attempts, however, is only using research-quality medical records—selecting patients from a narrow pool that collectively included every lab value available—which isn't a realistic way of evaluating the population. The data she used was "clinical quality," meaning these were records typically found in a physician's practice.
There were holes in the data, with missing lab values and incomplete patient and family histories. "Basically I took the raw data that was what we'd see in a real practice," she explains, "and built models to predict who was likely to have diabetes."
How is diabetes screened for, usually?
Ariana Anderson, PhD, helped develop a screening tool for diabetes using big data
Diabetes is most often diagnosed using factors like age, hypertension, BMI, gender and smoking status. A diagnostic laboratory test completes the process. Dr. Anderson compared the standard diabetes screening models to her records to see if there were "leftover things in the records" that would identify who might benefit from a diabetes lab test. She found additional factors using ICD-9 codes, such as intestinal infections and a history of some sexually transmitted diseases.
Other applications
Since the paper was published, Dr. Anderson has gotten calls from clinicians looking to apply the algorithm to their own practices using their patient records. The ultimate application of this research won't be for diabetes, but rather for rare diseases like lupus that don't commonly get screened, Dr. Anderson said. She envisions developing algorithms for different disease risk scores that clinicians can run through their medical records. The risk scores would pop up on the patient's profile, and the clinician could then decide whether to screen the patient. This algorithm would help identify diseases before they become problematic.
Says Dr. Anderson: "I'm interested in how we can use the data we already have to solve real world and health-oriented problems."
By Deborah Abrams Kaplan