This document describes how to understand the bioinformatics report generated by Aladdin BSBolt pipeline. Most of the plots are taken from the sample report. The sample report was generated using data from Zymo Research. The plots in your report might look a little different.
The bioinformatics report is generated using MultiQC
. There are general instructions on how to use a MultiQC report on MultiQC website. The report itself also includes a link to a instructional video at the top of the report. In general, the report has a navigation bar to the left, which allows you to quickly navigate to one of many sections in the report. On the right side, there is a toolbox that allows to customize the appearance of your report and export figures and/or data. Most sections of the report are interactive. The plots will show you the sample name and values when you mouse over them.
The general statistics table gives an overview of some important stats of your samples. For example, how many reads were in each sample, percentage of GC content, etc. These stats are collected from different sections of the report to give you a snapshot. This is usually the quickest way for you to evaluate how your bisulfite sequencing experiment went.
Other information you can get from this tables are (from left to right):
FastQC gives general quality metrics about your reads.
This plot show the total number of reads, broken down into unique and duplicate. Tabs provide both counts and percentages view. Duplicate read counts are an estimate only.
This plot provides information about the mean quality score distribution across your reads.
This histogram shows number of reads against average quality scores for examining if a subset of reads has poor quality.
The heatmap shows the base composition (%A/C/G/T) of each sample. Clicking on a sample track will show the data for that sample as a line plot.
The line plot shows average GC content of reads. Normal random library typically have a roughly normal distribution of GC content.
This graph shows the percentage of base calls at each position for which an N was called. If a sequencer is unable to make a base call with sufficient confidence then it will normally substitute an N rather than a conventional base call.
The plots show the total amount of overrepresented sequences found in each library.
The plot shows a cumulative percentage count of the proportion of your library which has seen each of the adapter sequences at each position. Once a sequence has been seen in a read it is counted as being present right through to the end of the read so the percentages you see will only increase as the read length goes on. Note that only samples with ≥ 0.1% adapter contamination are shown.
This heatmap summarises all of FastQC content into a single heatmap for a quick overview. Note that not all FastQC sections have plots in MultiQC reports, but all status checks are shown in this heatmap.