Empowering ground-breaking medical research

Our client is one of Australia’s most successful medical research institutes. They have recently embarked upon the largest longitudinal cohort study of its kind in Australia, following 10,000 children from their time into the womb and into early childhood, with the goal of better understanding when and why non-communicable diseases develop.

Challenge

Uncovering insights from vast survey datasets for medical research

Taking place over a five-year period, the research involves more than 20,000 individuals when including the family units of the 10,000 children. Throughout the project so far, the research team have gathered more than a million free text questions and answers from surveys with participants.

Faced with such a vast dataset, it was a significant challenge to accurately match participants’ everyday spoken language with precise medical terminology and disease names.

Solution

A custom LLM to bridge the gap between everyday speech and medical terminology

The research institute brought in DataDivers to help them develop an AI solution to aid this translation of terminology.

The team trained a Large Language Model (LLM) on medical literature to help identify the relevant survey records. To bridge the gap between normal spoken English and medical terms, our data scientists extracted topics out of the survey data and normalised them (e.g., to their base meanings), then associated them with medical equivalent terms. The team were able to fine-tune the LLM with this newly-created corpus of information, further enhancing its performance.

By the end of this journey, the full survey information had been associated with the relevant terms and made accessible as a knowledge base for the LLM.

Value delivered

Efficient and accurate interpretation of data at scale

The solution enabled:

Enhanced automated disease phenotyping from research survey data.
Foundations for developing tools to enable efficient cohort profiling and data harmonisation across large, heterogeneous research datasets.
Efficient and accurate harmonisation of the growing volume of survey data with standardised disease ontologies.
Acceleration of raw research data accessibility to help uncover hidden insights.

Data Science, Advanced Analytics and Artificial Intelligence Solutions

Want to be data-driven, but not sure where to start with AI and data science?

Transform raw data into powerful insights that drive smarter decisions, fuel innovation, and accelerate growth. Our Data Scientists are here to help you dive deep into your data and make sense of it. We'll organise, analyse, and visualise it to help you leverage its power.

Learn more

Get in touch

Let's create positive change for your organisation

Data is the lifeblood of every transformation, so let's dive into organising, analysing, and visualising yours.

Reach out today to speak to one of the DataDivers team via the form or by emailing us at contact_us@datadivers.io.

Thanks for reaching out! We've got your me and one of our team members will be in touch soon!

Oops! Something went wrong while submitting the form. Please try again or email to contact_us@datadivers.io.

Technology we use

Technology we use

Technology we use

Technology we use

Empowering ground-breaking medical research

Uncovering insights from vast survey datasets for medical research

A custom LLM to bridge the gap between everyday speech and medical terminology

Efficient and accurate interpretation of data at scale

Read more client stories