Training > Pillar one

Health data science

Modules under this pillar will introduce and train you to use a variety of large data analysis techniques, allowing you to process the ever-growing bank of electronic health record data for both research and real world applications.

Using a combination of your own and exemplar data these training modules offer hands on learning with the chance to apply your newly acquired skills. Modules will cover themes including health informatics, statistical programming, data science, modelling, natural language processing and more.

This training offers flexible and modular skill development pathways for researchers and all staff which work in health (clinical or non clinical) at different career stages, skill levels and sectors within health sciences. 

Health data science

Summary of Modules

Arrows indicate the suggested flow of learning. All modules can be taken as and when required.

Introduction to Python

Now Live

C

Introduction to AI and deep learning

Coming Soon

C

Machine Learning

Now Live

Introduction to R

Now Live

Prediction Modelling

Now Live

Natural Language Processing

Coming Soon

Introduction to R for Health Research

This module introduces you to the basics (and beyond) of the R statistical programming language, and gets you started working with it, in a health research context. You will benefit from having a background in basic statistical methods (e.g. probability distributions, hypothesis testing) and high school mathematics (e.g. basic calculus, basic linear algebra).

The main benefit from this course comes from working on the interactive quizzes and practicals, the benefit you will accrue from this course is immensely correlated with the amount of effort you put into the practical work.

R is a really great programming language with a huge, international user base and it is free to use in a educational setting

LEARNING AIMS

By the end of this module, you will be able to use R to import and manipulate datasets, and be confident in performing programming operations such as using conditionals, loops, and functions, as well as conducting basic statistical analyses

You will be able to demonstrate subject-specific knowledge and understanding, and have the ability to:

  • Develop your theoretical understanding of the basics of programming
  • Understand the use and usefulness of conditionals, iterations, and functions
  • Identify appropriate methods for solving particular data manipulation problems
  • Formulate sensible programming solutions for data sets, and demonstrate an ability to check the correctness of your manipulations.
  • Become an independent user of R, in finding help online, and demonstrating how to select appropriate packages for your project
This course will assume that participants have:
  • No prior knowledge of R , but some experience in any programming language would be useful.
  • Familiarity with basic statistical concepts

By completing this module you will meet 10 of the capability statements in the Artificial Intelligence (AI) and Digital Healthcare Technologies Capability Framework. This framework outlines the skills and capabilities that will allow health and care professionals to work effectively in a digitally enhanced environment.

The content in this module covers skills and capabilities in the following domains:

  • 2.2 Remote consultation and monitoring (m)
  • 5.1.1 Data collection (b,i)
  • 5.1.3 Data Visualisation and Reporting (a,c,d,h)
  • 5.1.4 Data Processing and Analytics (j,m,n)

Prediction Modelling

This is a comprehensive introduction to the fundamentals of clinical prediction modelling using modern statistical modelling techniques for health research. It covers all steps of developing and accessing a prediction model. You will be introduced to the theory and practical implementation of cutting-edge predictive statistical and machine learning modelling techniques using the R statistical software.

The main areas covered are:

  • Data pre-processing– what you need to do to data before it can be used to train predictive models, for example handling missing data
  • Model training– principally using penalised generalised linear models and Survival models, for regression and classification
  • Model validation– assessing the  model’s performance

The course pre-requisites are:

  • Some background in basic statistical methods (e.g. probability distributions, hypothesis testing) and high school mathematics (e.g. basic calculus, basic linear algebra).
  • A working knowledge of the R programming language

This course is for you if you want to:

  • Have a good understanding of core clinical prediction concepts.
  • Be able to describe how modern statistical concepts, regression and machine learning methods can be applied to medical prediction problems.
  • Be familiar with the principles that play a role in internal validation such as over-fitting, optimism and shrinkage and understand key components of internal validation methods such as cross-validation and bootstrapping.
  • Be able to develop simple prediction models, assess their quality and validate them using R software
  • Be able to critically assess the general applicability of a developed model to predict future outcomes.
  • Be equipped with a range of statistical and machine learning skills, which will enable you to take prominent roles in a wide spectrum of employment and research.

 

By completing this module you will meet 12 of the capability statements in the Artificial Intelligence (AI) and Digital Healthcare Technologies Capability Framework. This framework outlines the skills and capabilities that will allow health and care professionals to work effectively in a digitally enhanced environment.

The content in this module covers skills and capabilities in the following domains:

  • 3.1 Ethics (f)

  • 5.1.3 Data Visualisation and Reporting (e,f)

  • 5.1.4 Data Processing and Analytics (c,d,o)

  • 6.0 Artificial Intelligence (k)

  • 6.1 Machine Learning and Natural Language Processing (g,l)

  • 6.3 Evaluating AI systems (c,f,i)

Introduction to Python for Health Research

This module introduces you to the basics (and beyond) of the Python object-oriented programming language and gets you started working with it, such that by the end of the module you will be able to work independently in Python. 

Python is a really great programming language with a huge, international user base and it is free to use in an educational setting

Learning aims

By the end of this module you will be able to

  • Use Python to import and manipulate your own datasets
  • Perform programming operations such as conditions, loops and functions
  • Conduct basic statistical analyses on your own data sets using Python.
  • Be an independent user of Python, being able to find help online, and demonstrate how to select appropriate packages for your projects

This course will assume that participants have:

  • No prior knowledge of Python required.
  • The time to complete the interactive quizzes and practicals.  
  • You will only need access to a computer with internet capabilities.

By completing this module you will meet 10 of the capability statements in the Artificial Intelligence (AI) and Digital Healthcare Technologies Capability Framework. This framework outlines the skills and capabilities that will allow health and care professionals to work effectively in a digitally enhanced environment.

The content in this module covers skills and capabilities in the following domains:

  • 2.2 Remote consultation and monitoring (m)
  • 5.1.1 Data collection (b,i)
  • 5.1.3 Data Visualisation and Reporting (a,c,d,h)
  • 5.1.4 Data Processing and Analytics (j,m,n)

Machine Learning for Health Research

This course is a comprehensive introduction to the principles of Machine Learning, particularly supervised and unsupervised machine learning in the context of biomedical informatics.

Here you will recognise the distinctive characteristics and applicability of classification, regression and clustering to their research. The course will introduce a general framework for building Machine Learning solutions to solve a particular problem, focusing on robustness and generalizability and interpretability. A select number of algorithms will be presented in detail and those range from Tree-based classification and regression models to kernel-based methods and Neural Networks.

 

Learning aims 

  • Gain an in-depth understanding of diverse ML algorithms and their applications in health research.
  • Handle missing data, particularly in the context of Electronic Health Records.
  • Utilize and optimize tree-based and ensemble models for complex health data.
  • Implement neural networks and understand their learning processes.
  • Effectively manage the entire machine learning pipeline from data acquisition to model evaluation.
  • Achieve data imputation using sophisticated techniques.
  • Understand the ethics and challenges in applying ML in healthcare contexts.

 

This course will assume that participants

  • You have a foundational understanding of data and wish to explore the world of machine learning as it applies to health research.
  • Work in a health-related sector, be it clinical, administrative, or academic roles – this can be as a doctor, health data analyst, public health researcher, and beyond.
  • You are looking to harness the power of machine learning to uncover insights, predict outcomes, or improve health care processes in your current or prospective position.
  • Have a working knowledge of Python  

Introduction to AI/Deep Learning

    This course aims to:

    • Develop an understanding of the underlying principles of artificial intelligence.
    • Identify and describe different intelligent agents and their operating environments
    • Understand different problem-solving and planning strategies, with a particular focus on constraint satisfaction problems and multi-agent planning.
    • Justify and critique the use of different learning methods for medical and healthcare applications, in particular those used in temporal reasoning tasks.
    • Be familiar with the principles of reinforcement learning and other goal-oriented algorithms.
    • Be able to develop simple AI techniques, assess their quality and validate them using Python software.

    This workshop will assume that participants:

    • have a basic knowledge of the high-level programming language, python

     

    Closed: we are not currently accepting applications for this module.

    Natural Language Processing

    The course provides an introduction to the nature of medical text, and the technical and organisational challenges encountered when processing. The course will be based around practical examples and widely used NLP tools.

    The course aims to provide an introduction to the major techniques of natural language processing, and to provide participants with the skills to create their own NLP applications, using both rules based and statistical approaches. Methods will be introduced for extracting structured information from text, and for automatically classifying text, together with the selection of data for training and for evaluation.
    The course will provide a practical instruction in the use of some widely used tools in NLP, including nltk (a Python toolkit).

    This workshop will assume that participants

    • have a basic knowledge of the high-level programming language, python.

    Closed: we are not currently accepting applications for this module.