Enrichment Analysis

Table of contents

Introduction

Enrichment analysis is a method to identify classes of proteins that are over-represented in a large set of taxa or functions. The method uses statistical approaches to identify significantly enriched groups of taxa and/or functions.

About the app

Our application has three major functions: function enrichment, taxon enrichment, and function vs. taxon correlation. For function analysis module, protein list is assinged to function terms (COG/NOG/COG cateogry/NOG/Category/KEGG/GO), or NOG category/COG category to NOG/COG terms. Then enrichment is measured by hypergeometric distributuion p-value. Similar ways are used for taxon enrichment analysis. It has to be noted that the p-value of hypergeometric is based on the hit matched to the database. Note that the fasta database used for metaproteomics has to be one of the following: human IGC, or mouse Gene catalog.

This application uses a series of open source R packages, including gplots, d3heatmap, corrplot, colourpicker, htmlwidgets, shinydashboard, shiny, DT, networkD3, circlize and many more.

Tutorial

Step 1. Data preparation

There are two different ways to prepare your data. One way is to upload your data table using a spreadsheet, as the example below.

The other way is to directly prepare a list of detected features for each sample. This will be copy-pasted to the Shiny app.

Step 2. Upload your data

Go to https://shiny.imetalab.ca/metaproteomics_enrichment/ and upload your data.

Workflow 1 accepts the spreadsheet, remember to choose the correct formats. After checking your data matrix, click “Go to Enrichment Analysis Settings Page” button to continue.

Workflow 2 is prepared for your copy-paste list. First, if you have more than one list, click “Add more list” button to get a blank area to paste the list of features for each sample. After the list is pasted, click the button at the lower right to continue.

Step 3. Analysis setting

Next step we’ll ask you to choose the analysis that you’d like to perform. Then there will be options to set up your analysis.

In general, each type of analysis will ask you to specify data source (human or mouse microbiome), and set the p-value. For functional analysis, you can choose from one of the functional annotations: COG, NOG, KEGG, GOs. Click
the button at the lower right to continue.

Step 4. Analyses

1 ) Functional enrichment

For function analysis, proteins (list) (COG/NOG also works for high level analysis) in each sample (as columns) are assgined to each functional category, the p-value by hypergeometric distribution is cacluated, and corrected (by FDR) within each dataset. Users can choose further filtering by p-value. Functions with at least one qualified match above the p-value cutoff (across all samples) will be kept in the final list and used for all downstream visualization. The orignal function assigment and filtered data can all be exported afterwards.

You can switch tabs to visualize the functional analysis results in the forms of pie-profile, heatmap-profile and enrichment bar chart.

2 ) Taxon enrichment

For taxon analysis, proteins (list) in each sample (as columns) are assgined to all levels of taxon nodes, the p-value by hypergeometric distribution is cacluated, and corrected (by FDR) within each dataset. Users can choose further filtering by p-value. Taxons with at least one qualified match above the p-value cutoff (across all samples) will be kept in the final list and used for further visualization. The orignal taxon assigment and filtered data can all be exported afterwards.

You can switch tabs to visualize the taxon analysis results in the forms of pie-profile, heatmap-profile, enrichment bar chart, and visualize taxonomic composition.

3) Function and taxon correlation

For function and taxon interaction/correlation analysis, proteins (list) in each sample (as columns) are assgined to both function category and taxon nodes. The p-value by hypergeometric distribution is cacluated, and corrected (by FDR) within each dataset for both function and taxon data table. Users can choose further filtering by p-value. At least one qualified match (either function, or taxon) across all samples are kept in the final list. Then the taxon and function list are compared to keep the overlapped the proteins in a data matrix, which is the basis of further visualizations. The orignal assigment and filtered data can all be exported afterwards.

It may take minutes if you uploaded a large list. After the search and calculation are done, you can choose a sample (or combine all samples) for result visualization. You can also specify the taxonomic levels for plot. Then click “Go analysis” to continue. You can switch tabs to visualize the taxon analysis results in the forms of pie-profile, heatmap-profile, as well as Cicros and Sankey plots of correlation.


Back to the list of iMetaShiny Apps.