Bioinformatic Methods II

University of Toronto via Coursera

Go to Course: https://www.coursera.org/learn/bioinformatics-methods-2

Introduction

### Course Review: Bioinformatic Methods II on Coursera In the realm of modern biology, the ability to analyze vast amounts of genomic and proteomic data is increasingly important. Bioinformatics sits at the intersection of biology and computational science, providing essential methodologies for interpreting the complex datasets generated by technologies such as RNA-seq and microarrays. For those looking to deepen their understanding and skills in bioinformatics, Coursera's course **Bioinformatic Methods II** offers an in-depth exploration of these topics. #### Course Overview **Bioinformatic Methods II** is designed for individuals who already have some foundational knowledge of bioinformatics. The course focus is on analyzing large-scale biological data and employing existing bioinformatic resources — mainly web-based programs and databases — to answer biological questions. Notably, it is an excellent follow-up to its predecessor, **Bioinformatic Methods I**, as it builds on core concepts by diving deeper into specific topics like protein motifs, interactions, structure, and gene expression analysis. #### Syllabus Breakdown 1. **Protein Motifs** - The course begins with the exploration of conserved regions within protein families. Understanding these motifs is crucial because they can shed light on the biological functions of sequences. Students will learn various methods to describe these motifs, from regular expressions to hidden Markov models (HMMs). 2. **Protein-Protein Interactions (PPIs)** - This module emphasizes the significance of PPIs in understanding protein functions. Students will examine different methods to determine PPIs, along with utilizing databases and tools to explore partners of specific proteins like BRCA2. Additionally, the integration of Gene Ontology (GO) term enrichment analysis helps to understand the biological roles of these interactions. 3. **Protein Structure** - This segment introduces students to the techniques of determining protein tertiary structures, helping to elucidate biological function. By exploring databases like the Protein Data Bank (PDB) and employing software like PyMOL, students will gain practical skills in structural bioinformatics. 4. **Gene Expression Analysis I & II** - These modules focus on RNA-seq data and its analysis, which is essential for understanding gene activity in various tissues. Students will work with real RNA-seq datasets using tools like BioConductor for data processing, differential expression analysis, and visualizations such as heat maps. 5. **Cis Regulatory Systems** - The final module explores the role of cis-regulatory elements in gene expression. This involves analyzing promoter regions and predicting new regulatory elements, thus providing a deeper understanding of gene regulation mechanisms. 6. **Final Assignment** - Students synthesize their knowledge through a final review and assignment that tests their understanding of the entire course material. #### Course Evaluation **Strengths:** - **Comprehensive Curriculum:** The course tackles a range of essential topics in bioinformatics, from protein analysis to gene expression, ensuring a holistic understanding of the field. - **Hands-On Experience:** The blend of theoretical knowledge and practical lab work allows students to apply what they learn in real-world contexts, reinforcing the concepts effectively. - **Top-Notch Resources:** By utilizing well-known databases and tools, learners can step into the bioinformatics landscape with familiarity and confidence. **Weaknesses:** - **Pace of Learning:** The course can be fast-paced, which may be challenging for those who are new to the field. Previous knowledge in biology and basic programming would be advantageous. - **Technical Requirements:** Access to certain external tools or software might require additional setup, which could pose challenges for beginners. #### Recommendation **Bioinformatic Methods II** is highly recommended for anyone looking to strengthen their bioinformatics skills, especially those pursuing careers in molecular biology, genomics, and computational biology. The course is also valuable for researchers who need to analyze biological data efficiently. By the end of the course, students will not only have improved their analytical skills but will also understand the significance of bioinformatics in addressing complex biological questions. In summary, this Coursera course offers a robust framework for engaging with contemporary bioinformatics challenges, making it an excellent investment for your professional development in this exciting and rapidly evolving field.

Syllabus

Protein Motifs

In this module we'll be exploring conserved regions within protein families. Such regions can help us understand the biology of a sequence, in that they are likely important for biological function, and also be used to help ascribe function to sequences where we can't identify any homologs in the databases. There are various ways of describing the conserved regions from simple regular expressions to profiles to profile hidden Markov models (HMMs).

Protein-Protein Interactions

In this module we'll be exploring protein-protein interactions (PPIs). Protein-protein interactions are important as proteins don't act in isolation, and often an examination of the interaction partners (determined in an unbiased, perhaps high throughput way) of a given protein can tell us a lot about its biology. We'll talk about some different methods used to determine PPIs and go over their strengths and weaknesses. In the lab we'll use 3 different tools and two different databases to examine interaction partners of BRCA2, a protein that we examined in last module's lab. Finally, we'll touch on a "foundational" concept, Gene Ontology (GO) term enrichment analysis, to help us understand in an overview way the proteins interacting with our example.

Protein Structure

The determination of a protein's tertiary structure in three dimensions can tell us a lot about the biology of that protein. In this module's mini-lecture, we'll talk about some different methods used to determine a protein's tertiary structure and cover the main database for protein structure data, the PDB. In the lab we'll explore the PDB and an online tool for searching for structural (as opposed to sequence) similarity, VAST. We'll then use a nice piece of stand-alone software, PyMOL, to explore several protein structures in more detail.

Review: Protein Motifs, Protein-Protein Interactions, and Protein Structure

Gene Expression Analysis I

When and where genes are expressed (active) in tissues or cells is one of the main determinants of what makes that tissue or cell the way it is, both in terms of morphology and in terms of response to external stimuli. Several different methods exist for generating gene expression levels for all of the genes in the genome in tissues or even at cell-type-specific resolution. In this class we'll be processing and then examining some gene expression data generated using RNA-seq. We'll explore one of the main databases for RNA-seq expression data, the Sequence Read Archive (SRA), and then use an open-source suite of programs in R called BioConductor to process the raw reads from 4 RNA-seq data sets, to summarize their expression levels, to select significantly differentially expressed genes, and finally to visualize these as a heat map.

Gene Expression Analysis II

When and where genes are expressed (active) in tissues or cells is one of the main determinants of what makes that tissue or cell the way it is, both in terms of morphology and in terms of response to external stimuli. Several different methods exist for generating gene expression levels for all of the genes in the genome in tissues or even at cell-type-specific resolution. In this class we'll be hierarchically clustering our significantly differentially expressed genes from last time using BioConductor and the built-in function of an online tool, called Expression Browser. Then we'll be using another online tool that uses a similarity metric, the Pearson correlation coefficient, to identify genes responding in a similar manner to our gene of interest, in this case AP3. We'll use a second tool, ATTED-II to corroborate our gene list. We'll also be exploring some online databases of gene expression and an online tool for doing a Gene Ontology enrichment analysis.

Cis Regulatory Systems

When and where genes are expressed in tissues or cells is one of the main determinants of what makes that tissue or cell the way it is, both in terms of morphology and in terms of response to external stimuli. Gene expression is controlled in part by the presence of short sequences in the promoters (and other parts) of genes, called cis-elements, which permit transcription factors and other regulatory proteins to bind to direct the patterns of expression in certain tissues or cells or in response to environmental stimuli: We'll explore a couple of sets of promoters of genes that are coexpressed with AP3 from Arabidopsis, and with INSULIN from human, for the presence of known cis-elements, and we'll also try to predict some new ones using a couple of different methods.

Review: Gene Expression Analysis and Cis Regulatory Systems + Final Assignment

Overview

Large-scale biology projects such as the sequencing of the human genome and gene expression surveys using RNA-seq, microarrays and other technologies have created a wealth of data for biologists. However, the challenge facing scientists is analyzing and even accessing these data to extract useful information pertaining to the system being studied. This course focuses on employing existing bioinformatic resources – mainly web-based programs and databases – to access the wealth of data to answer q

Skills

Reviews

Hi Nicholas, Thank you so much for giving a lot of information. Bioinformatic Methods II was little difficult but understood after repeating the lad discussions. Thanks a lot.

I was a very very useful course, specially the heatmap plot generation and R language lab exercises was great!

very informative course and it is very well explained\n\ngood work

I really appreciate these series of courses, I want to thank Prof. Provart and his coligues for their great job on preparing and presenting these series. Thanks a lot!

Gives the student real world exposure to the tools to study proteins gene regulation, etc. Instructor is involved and friendly. Highly recommended for someone who is interested in contemporary