Rothamsted Research

where knowledge grows

Statistics and Bioinformatics Training

The Applied Statistics and Applied Bioinformatics Groups have developed a series of training courses to support the application of quantitative tools to add value to the institute scientific research programme.

The programme of five main statistics courses (BSIG, DASE, ILRA, IMA, ARA) has been designed to be attended in order, with later courses building on the knowledge gained in the earlier courses, and using the GenStat statistical package to implement the anaysis methods introduced in each course.  Other courses have been designed to meet specific identified needs of our user community - training in specific software tools, or methods associated with particular technologies.  Throughout the courses, participants are encouraged to think about how the approaches can be applied in their own research.

The courses primarily aim to provide PhD students and research staff with a sufficient understanding of key statistical and bioinformatics concepts and methods that they can apply to initially explore and analyse their own data. The programme also aims to raise an awareness of the broader range of statistical and bioinformatic tools that are available, to encourage further interactions with the statistical and bioinformatics consultants to ensure the design of effective and efficient experiments and the application of powerful and the most appropriate methods of analysis to extract the maximum information from all data collected.

The programme of five main statistics course are usually scheduled to run as a series over a 3-4 month period (the first series starting in late October, the second in mid-February).  The other courses are currently run based on demand.

The Applied Statistics and Applied Bioinformatics Groups are also currently engaged in a BBSRC project concerned with the development of some of the course materials as eLearning resources in Biomathematics and Bioinformatics.

List of Current Training Courses:

Basic Statistics and Introduction to GenStat (BSIG) (2 days)

Design and Analysis of Simple Experiments (DASE) (2 x 2 days)

Introduction to Linear Regression Analysis (ILRA) (2 days)

Advanced Regression Analysis (ARA) (2 days)

Introduction to Multivariate Analysis (IMA) (2 days)

Bioinformatics 101: Introduction to Rothamsted's scientific computing infrastructure (0.5 day)

Reproducible data analysis in Galaxy (1 day)

RNA-seq analysis using Galaxy (1 day)

Introduction to Integrative Omics with Ondex and QTLNetMiner (1 day)

Introduction to R (2 days)

Analytics in R (2 days)

Geek for a Week (5 days)

Course Descriptions 

Basic Statistics and Introduction to GenStat (BSIG)

This 2-day course is intended as a sound foundation for all our statistics courses and aims to provide a clear understanding of fundamental statistical principles, revising both numerical and graphical methods for summarising data, introducing some simple statistical tests based on commonly-assumed statistical distributions, and introducing the GenStat statistical package as a tool to implement the simple statistical summary methods and tests. The course is a mixture of presentations and software demonstrations, and practical sessions for participants to put the statistical approaches into practice.

Course description

Course contact: Andrew Mead

Design and Analysis of Simple Experiments (DASE)

This 4-day course, delivered in two 2-day parts (usually 2 weeks apart), aims to provide a clear understanding of the statistical principles for designing experiments, and the importance of following these principles to produce an effective and efficient experiment, and a clear appreciation of the statistical approaches (primarily analysis of variance) used to analyse data from designed experiments in the biological sciences. As well as lecture and practical sessions (using GenStat), the course also includes time for general open discussions and analysis of the participants' own data, and incorporates various ‘hands-on activities’.

Course description

Course contact: Andrew Mead

Introduction to Linear Regression Analysis (ILRA)

This 2-day course introduces linear regression techniques, including extensions from the simplest model, with one explanatory variable, to models with multiple explanatory variables, and with both explanatory variables and explanatory factors. It also considers the use of transformations of both response and explanatory variables as approaches to cope with scenarios where the assumptions associated with these linear regression techniques are not met.  The course includes a mixture of presentations, software demonstrations and practical sessions, with the implementation of all regression modeling approaches illustrated using the GenStat statistical package.

Course description

Course contact: Andrew Mead

Advanced Regression Analysis (ARA)

This 2-day course introduces a range of more advanced regression modelling approaches, building on the methods introduced in ILRA, and demonstrating the links between regression modelling methods and the analysis of data from designed experiments (see DASE).  The course aims to provide a clear understanding of the application of regression techniques for data from designed experiments (introducing linear mixed models (LMMs)), approaches to non-linear curve fitting, and the extension from linear models to generalized linear models (GLMs) for non-Normal data, using a mixture of presentations, software demonstrations in GenStat, and practical sessions.

Course description

Course contact: Andrew Mead

Introduction to Multivariate Analysis (IMA)

This 2-day course aims to provide an introduction to multivariate analysis approaches and a clear understanding of the most commonly-used multivariate techniques - principal components analysis, canonical variates analysis, (hierarchical and non-hierarchical) cluster analysis and principal coordinates analysis.  Application of these methods is illustrated using the implementation of these techniques in the GenStat statistical package, with a strong focus on the interpretation of the output produced.

Course description

Course contact: Andrew Mead

Introduction to R

This course aims to provide an introduction to R, the statistical computing environment, using the R Studio interface to introduce the key components and syntax of R, to enable participants to produce simple scripts.  The course then introduces the implementations in R for a range of standard statistical approaches (as introduced in the main programme of statistical courses).

Course contacts: Andrew Mead

Reproducible data analysis in Galaxy

Galaxy is an open, web-based platform for data intensive biomedicall research. It offers an accessible, reproducible and transparent computational workbench for the biologist. This workbench is very useful in automating repeated analysis steps in the form of workflows. The workbench interface is simple & intuitive, supports collaboration and integrates data and analysis tools together in a single place. The course is a mixture presentations, software demonstrations and hands-on sessions for participants to learn good bioinforamatics practices in Next Generation Sequence (NGS) analysis.

Course contacts: Keywan Hassani-Pak

RNA-seq analysis using Galaxy

This course aims to link bioinformatics and statistical approaches with the biological context of this technology, to provide participants with a clear understanding of the key issues involved with data collection, analysis and interpretation of results for experiments using this technology. Approaches are illustrated using Galaxy and the R statistical computing environment, considering a number of available packages for the different stages of the process.

Course contacts: Andrew Mead and Keywan Hassani-Pak

Introduction to Integrative Omics with Ondex and QTLNetMiner

This 1-day course introduces major biological databases such as Ensembl, UniProt, Gene Ontology, Kegg, Pfam, Expression Atlas and other omics resources. It will provide participants with the know-how and tools for efficient search and visualization of the data to be used fo hypothesis generation and gene discovery. The principles of data integration and data mining will be demonstrated using Ondex and QTLNetMiner - two free and open-source software platform developed at Rothamsted Research.

Course contact: Keywan Hassani-Pak

Bioinformatics 101: Introduction to Rothamsted's scientific computing infrastructure

This 1-day course aims to provide an introduction to Galaxy, Geneious and Linux - the main platforms used at Rothamsted Research to support data intensive biological research. It will highlight key features and demonstrate advantages of each platform. The course is a mixture presentations, software demonstrations and hands-on sessions for participants to understand the strengths and weaknesses of each platform.

Course contact: Keywan Hassani-Pak

Analytics in R

This 2 day course aims to teach the implementation of statistical and bioinformatics techniques such as: ANOVA, Regression and linear models, Hypothesis testing, Analysis of RNAseq and Analysis of SNP data.

Course contact: Andrew Mead

Geek for a Week

We help you to become a self-sufficient bioinformatics user by providing a desk in our office and our bioinformatics expertise for one week. As part of this collaborative approach, you will receive one to one support tailored to your needs and your data. At the same time as helping you to analyse your data we will try to teach you best practices and point you to relevant literature. We want to encourage this as a way to initiate collaborations that might lead to increased involvement in future projects. We also benefit, as it helps us to understand the biology and to improve our bioinformatics tools and resources. In long term the trainees of today might even evolve to become the trainers of tomorrow.

Course contact: Keywan Hassani-Pak

Training Facilities at Rothamsted Research

Training is held in a dedicated computer training suite or in a conference centre meeting room with laptops and space for break-out discussion groups.   

Rothamsted Research also has a conference centre facility with auditoriums with capacity for 150 and 300 people and the technical ability to produce live webinars.

Image Gallery: 

Click on an image for a full size version and slideshow