Our client is one of Australia’s most successful medical research institutes. They have recently embarked upon the largest longitudinal cohort study of its kind in Australia, following 10,000 children from their time into the womb and into early childhood, with the goal of better understanding when and why non-communicable diseases develop.
Taking place over a five-year period, the research involves more than 20,000 individuals when including the family units of the 10,000 children. Throughout the project so far, the research team have gathered more than a million free text questions and answers from surveys with participants.
Faced with such a vast dataset, it was a significant challenge to accurately match participants’ everyday spoken language with precise medical terminology and disease names.
The research institute brought in DataDivers to help them develop an AI solution to aid this translation of terminology.
The team trained a Large Language Model (LLM) on medical literature to help identify the relevant survey records. To bridge the gap between normal spoken English and medical terms, our data scientists extracted topics out of the survey data and normalised them (e.g., to their base meanings), then associated them with medical equivalent terms. The team were able to fine-tune the LLM with this newly-created corpus of information, further enhancing its performance.
By the end of this journey, the full survey information had been associated with the relevant terms and made accessible as a knowledge base for the LLM.
The solution enabled:
Data is the lifeblood of every transformation, so let's dive into organising, analysing, and visualising yours.
Reach out today to speak to one of the DataDivers team via the form or by emailing us at contact_us@datadivers.io.