RNA sequencing results report

发布时间:2023-1-24 | 杂志分类:其他
免费制作
更多内容

RNA sequencing results report

深圳华⼤基因股份有限公司 400-706-6615 ©2017BGIAllRightsReserved.44445799111213141515161818181920202021212122222323232425252626262727Table of ContentsResults1 Disclaimer2 Checklist: Getting Started3 Data Statistics4 sRNA Annotation5 sRNA Prediction6 sRNA Expression7 miRNA Target gene Prediction8 Screening Differentially Expressed piRNA9 Screening Differentially Expressed siRNA10 Screening Differentially Expressed miRNA11 DESs Hierarchical Clustering12 DESs Target Prediction13 Gene Ontology Enrichment An... [收起]
[展开]
RNA sequencing results report
粉丝: {{bookData.followerCount}}
文本内容
第1页

深圳华⼤基因股份有限公司 400-706-6615 ©2017BGIAllRightsReserved.

HOMtnbaS(BGISEQ-500

Small RNA)

2021/4/27

@2021 BGI All Rights Reserved

第2页

深圳华⼤基因股份有限公司 400-706-6615 ©2017BGIAllRightsReserved.

4

4

4

4

5

7

9

9

11

12

13

14

15

15

16

18

18

18

19

20

20

20

21

21

21

22

22

23

23

23

24

25

25

26

26

26

27

27

Table of Contents

Results

1 Disclaimer

2 Checklist: Getting Started

3 Data Statistics

4 sRNA Annotation

5 sRNA Prediction

6 sRNA Expression

7 miRNA Target gene Prediction

8 Screening Differentially Expressed piRNA

9 Screening Differentially Expressed siRNA

10 Screening Differentially Expressed miRNA

11 DESs Hierarchical Clustering

12 DESs Target Prediction

13 Gene Ontology Enrichment Analysis Of DESs Target

14 Pathway Enrichment Analysis Of DESs Target

Methods

1 Experiment Pipeline Steps for Small RNA Sequencing

2 Bioinformatics Pipeline

3 Data Filtering

4 Reads Mapping

5 sRNA Classification

6 sRNA Prediction

7 sRNA Expression

8 Target Gene Prediction

9 Screening DESs (Possion Distribution)

10 Hierarchical Clustering Analysis

11 Gene Ontology Enrichment Analysis

12 Pathway Enrichment Analysis

Help

1 Report Introduction

2 FASTQ Format

3 BAM Format

4 File Format Of Genome Statistics

5 File Format Of sRNA Catalog

6 File Format Of Unknown sRNA

7 File Format Of Novel miRNA

8 File Format Of Novel piRNA

9 File Format Of Novel siRNA

2/31

第3页

深圳华⼤基因股份有限公司 400-706-6615 ©2017BGIAllRightsReserved.

27

27

28

28

29

29

30

31

10 File Format Of sRNA Expression

11 File Format Of Target Prediction

12 DESs screening Format (Possion Distribution)

13 Cluster Format

14 GO Enrichment Analysis Result

15 Pathway Enrichment Analysis Result

FAQs

References

3/31

第4页

深圳华⼤基因股份有限公司 400-706-6615 ©2017BGIAllRightsReserved.

Results

1 Disclaimer

This is a BGI-tech Standard Small RNA Project report. It may contain privileged and/or

confidential information and is intended solely for the specific customer. If you have

received this report in error, please notify the sender and delete this report immediately.

You are hereby notified that any use, retention, disclosure, dissemination, copying, or

takingany otheractioninrelianceoncontents ofthis reportis prohibited.

2 Checklist: Getting Started

1. When you receive the package and hard drive, please check it is intact and not

damagedintransit.

2. Assuming there isn't an issue with the packaging or hard drive; proceed to check your

data.

3.Ensureyouback upyourdatabeforeprocessingit.

3 Data Statistics

In the project F21FTSCCKF1658_HOMtnbaS, we sequenced animal using BGISEQ500 technology

[1][2].The number of detected small non-coding rna (sncRNA) for each

samplewereshowninTable1.

Table 1 Summary of detected sncrna for each sample (Download)

Sample

name

Known miRNA

count

Novel miRNA

count

Known piRNA

count

Novel piRNA

count

Known siRNA

count

Novel siRNA

count

K10mDPCs 754 60 161 630 0 0

NCmDPCs 871 108 115 509 0 0

Before data analysis can be carried out, low-quality tags need to be removed (for more

information on this see Data Filtering in the Method page.)Table2 summarizes the

sequencing data for each sample and the distribution of base quality on clean tags is

presented inFigure1. Length distribution analysis allows us to see the composition of a

small RNA (sRNA) sample(seeFigure2).In general, the length of small RNA is found to

bebetween18ntand30nt.

Table 2 Summary of sequencing data for each sample (Download)

Sample

name

Sequence

type

Raw tag

count

Low quallity

tag count

Invalid

adapter tag

count

PolyA

tag

count

Short valid

length tag

Clean tag

count

Q20 of

clean tag

(%)

Percentage

of clean

tag(%)

K10mDPCs SE50 36,452,956 245,719 1,690,523 2,358 1,306,395 33,207,961 98.60 91.10

NCmDPCs SE50 37,253,264 240,828 588,178 430 1,323,304 35,100,524 98.80 94.22

Percentage(%)=Clean tag count/Raw tag count

4/31

第5页

深圳华⼤基因股份有限公司 400-706-6615 ©2017BGIAllRightsReserved.

Figure 1 Distribution of base quality on clean tag.

In Figure 1, the X axis is the base position along a tag. The Y axis is the base quality value. Each dot in

Figure 1 represents the total number of bases with a specific quality value of the corresponding base along

a tag. A darker dot color indicates a higher base number. If the percentage of the bases with low quality (<

20) is very high, then the sequencing quality of this tag is low. If one sample name appears several times,

then that means it consists of more than one sequencing tag.

Figure 2 Length distribution of sRNA.

The X axis shows the length of sRNA and the Y axis shows the percentage of the number of sRNA with a

specific length.

Table 3 K10mDPCs.Length_distribution.txt (Download)

Table 4 NCmDPCs.Length_distribution.txt (Download)

Table 5 Allsamples.Length_distribution.txt (Download)

4 sRNA Annotation

After filtering, clean tags were mapped to sRNA database such as miRBase

[3], Rfam[4],

siRNA,piRNA, snoRNAandetc.Table6lists seperate mapping rate foreach sample and

Figure3 shows the distribution of tags.The proportion of all kinds of sRNA is shown in

Confirm Show All

K10mDPCs

NCmDPCs

Confirm Show All

K10mDPCs

NCmDPCs

5/31

第6页

深圳华⼤基因股份有限公司 400-706-6615 ©2017BGIAllRightsReserved.

Figure4

The files listed in the tables (see links below) which are suffixed with catalog.xls are

results of sRNA annotation for each sample (see File Format Of sRNA Catalog in help

page).

Table 6 Alignment statistics of tags align to reference genome (Download)

Sample name Total tag Mapped tag Percentage(%)

K10mDPCs 33,207,961 28,929,564 87.12

NCmDPCs 35,100,524 33,134,337 94.40

Figure 3 Genome distribution of tags.

The X axis shows the relative position in the chromosome, and the Y axis shows the number of tags. The

color red indicates tag count, whereas green indicates tag catalogs.

Summarize all alignments and annotation before, the proportion of all kinds of small rna

inFigure4.

Figure 4 Catalog of sRNA.

Figure 4 shows the proportion of different types of sRNA. To make sure each sRNA corresponds to only

Confirm Show All

K10mDPCs

NCmDPCs

Confirm Show All

K10mDPCs

NCmDPCs

6/31

第7页

深圳华⼤基因股份有限公司 400-706-6615 ©2017BGIAllRightsReserved.

one annotation, we follow this priority rule: miRNA > piRNA > snoRNA > Rfam > other sRNA.

The following listed file suffixed withcatalog.xlsare results of small rna annotation for

eachsample(seeFileFormat Of sRNACataloginhelppage).

Table 7 K10mDPCs.catalog.xls (Download)

Table 8 NCmDPCs.catalog.xls (Download)

Summarize all kinds of konwn or concerved miRNA. The mature sequences and hairpin

sequences of miRNAs of the species listed in file known_miRNA.xls when miRbase

database have known miRNAs of this species, the mature sequences of other species

and predicted hairpin sequences of this species if miRbase database do not have known

miRNAs ofthisespecies.

Table 9 known_miRNA.xls (Download)

Figure 5 The first base distribution. <

X axis is the length of miRNA.

5 sRNA Prediction

After sRNA annotation, those unknown tags (seeFile Format Of Unknown sRNAin help

page) will be used to predict novel sRNA based on their architectural feature, the

corresponding tools seesRNAPredictionin method page. Novel miRNA was predicted,

the result format seeFile Format Of Novel miRNA in help page. Novel piRNA was

predicted, the result format seeFile Format Of Novel piRNAin help page. Novel siRNA

was predicted, the result format seeFile Format Of Novel siRNAin help page. The details

ofpredictedsRNAs as belows:

Table 10 unknown.xls (Download)

Table 11 predicted_miRNA.xls (Download)

Table 12 predicted_piRNA.xls (Download)

Table 13 predicted_siRNA.xls (Download)

An example for the stem loop structure of precursors of predicted miRNA was shown in

Figure6,thestructureofeachcandidatemiRNA was storedinprecursor.tar.gz. 7/31

第8页

深圳华⼤基因股份有限公司 400-706-6615 ©2017BGIAllRightsReserved.

Figure 6 The stem loop structure of precursors of predicted miRNA. <

Figure 5 shows the secondary structure of candidate miRNA based on maximum free energy (MFE)

structure.

Figure 7 The first base distribution of predicted miRNAs. <

Statistics the first base of all kinds of predicted miRNAs. X axis means the length of miRNA, the numbers in

figure means the kinds number of predicted miRNAs

Figure 8 The first base distribution of predicted piRNAs. <

Statistics the first base of all kinds of predicted piRNAs. X axis means the length of piRNA, the numbers in

figure means the kinds number of predicted piRNAs

8/31

第9页

深圳华⼤基因股份有限公司 400-706-6615 ©2017BGIAllRightsReserved.

Figure 9 The first base distribution of predicted siRNAs. <

Statistics the first base of all kinds of predicted siRNAs. X axis means the length of siRNA, the numbers in

figure means the kinds number of predicted siRNAs

6 sRNA Expression

BaseontheUMI species numbers of sequencingreads, thesmallrnaexpressionlevel is

calculated by using UMI species numbers (seesRNA Expressionin method page), and

the result format seeFile Format Of sRNA Expressionin help page. The expression of

eachsamplewerelistedas below.

Table 14 Allsamples.exp.xls (Download)

Table 15 K10mDPCs.expr.xls (Download)

Table 16 NCmDPCs.expr.xls (Download)

Figure 10 Correlation analysis of all samples. <

Both X and Y axis represent each sample. Coloring indicate Pearson correlation (high: blue; low: white).

7 miRNA Target gene Prediction

9/31

第10页

深圳华⼤基因股份有限公司 400-706-6615 ©2017BGIAllRightsReserved.

BGI uses multiple types of software to find the target gene of miRNA. You can find out

moreinformationonTarget GenePredictionintheMethodpage.

BGI chooses the intersection or union of the target gene as the final result. The results

areshownbelow inTable17and Figure11.

To see the full results, downloadTable18from the web link below.For more information

on the results format please seeFile Format Of Target Prediction in the Additional

Informationpage.

Table 17 Target statistics (Download)

Type miRNA count Target count RNAhybrid target TargetScan target miRanda

Filter 2,039 107,049 107,049 14,002 107,049

Union 2,040 107,186 107,182 14,086 107,056

Figure 11 Venn statistics of different target gene prediction software. <

Venn statistics of different target gene prediction software.

Table 18 Target union result (Download)

The next step involves setting up appropriate filter conditions such as MFE and then

furtheranalyzingtheintersectiontargets.Thefilteredresult canbeseeninFigure12.

To see the full results, downloadTable19from the web link below. For more information

on the results format please seeFile Format Of Target Prediction on the Additional

Informationpage.

10/31

第11页

深圳华⼤基因股份有限公司 400-706-6615 ©2017BGIAllRightsReserved.

Figure 12 Venn statistics of different target gene prediction software based on appropriate filter

conditions. <

Venn statistics of different target gene prediction software.

Table 19 Target filter result (Download)

8 Screening Differentially Expressed piRNA

Differentially Expressed SRNAs(DESs) screening is aimed to find differentially

expressed small rna between samples and do the further analysis. We use ExpDiff (see

Screening DESs \\(Possion Distribution\\)inmethodpage)methodtodothis analysis.The

DESs ineachpairwiseas showninFigure13.

Figure 13 Differentially expressed piRNA. <

The X axis represents the DESs pair, and the Y axis represents the number of screened DESs, with green

indicating down-regulation and red indicating up-regulation. 11/31

第12页

深圳华⼤基因股份有限公司 400-706-6615 ©2017BGIAllRightsReserved.

All expressed piRNA of each pairwise were stored in*.diffexp.xlslisted as below. The

result format please see ExpDiff (seeDESs screening Format \\(Possion Distribution\\)in

helppage)

Table 20 NCmDPCs-vs-K10mDPCs_ExpDiff.diffexp.xls (Download)

Screened DESs were stored*.diffexpfilter.xlslisted as below. The result format please

seeExpDiff(seeDESs screeningFormat \\(Possion Distribution\\)inhelppage)

Table 21 NCmDPCs-vs-K10mDPCs_ExpDiff.diffexpfilter.xls (Download)

Figure 14 Distribution of differentially expressed piRNA. <

In Figure, the X axis is the log2(Fold change) of DES, the Y axis is the -log10(FDR) of DES, with green

points indicating down-regulation (log2(Fold change)<=-1 and FDR <=0.001) and red points indicating upregulation (log2(Fold change)>=1 and FDR <=0.001).

9 Screening Differentially Expressed siRNA

Differentially Expressed SRNAs(DESs) screening is aimed to find differentially

expressed small rna between samples and do the further analysis. We use ExpDiff (see

Screening DESs \\(Possion Distribution\\)inmethodpage)methodtodothis analysis.The

DESs ineachpairwiseas showninFigure16.

12/31

第13页

深圳华⼤基因股份有限公司 400-706-6615 ©2017BGIAllRightsReserved.

Figure 16 Differentially expressed siRNA. <

The X axis represents the DESs pair, and the Y axis represents the number of screened DESs, with green

indicating down-regulation and red indicating up-regulation.

All expressed siRNA of each pairwise were stored in*.diffexp.xlslisted as below. The

result format please see ExpDiff (seeDESs screening Format \\(Possion Distribution\\)in

helppage)

Table 22 NCmDPCs-vs-K10mDPCs_ExpDiff.diffexp.xls (Download)

Screened DESs were stored*.diffexpfilter.xlslisted as below. The result format please

seeExpDiff(seeDESs screeningFormat \\(Possion Distribution\\)inhelppage)

Table 23 NCmDPCs-vs-K10mDPCs_ExpDiff.diffexpfilter.xls (Download)

10 Screening Differentially Expressed miRNA

Differentially Expressed SRNAs(DESs) screening is aimed to find differentially

expressed small rna between samples and do the further analysis. We use ExpDiff (see

Screening DESs \\(Possion Distribution\\)inmethodpage)methodtodothis analysis.The

DESs ineachpairwiseas showninFigure19.

Figure 19 Differentially expressed miRNA. < 13/31

第14页

深圳华⼤基因股份有限公司 400-706-6615 ©2017BGIAllRightsReserved.

The X axis represents the DESs pair, and the Y axis represents the number of screened DESs, with green

indicating down-regulation and red indicating up-regulation.

All expressed miRNA of each pairwise were stored in*.diffexp.xlslisted as below. The

result format please see ExpDiff (seeDESs screening Format \\(Possion Distribution\\)in

helppage)

Table 24 NCmDPCs-vs-K10mDPCs_ExpDiff.diffexp.xls (Download)

Screened DESs were stored*.diffexpfilter.xlslisted as below. The result format please

seeExpDiff(seeDESs screeningFormat \\(Possion Distribution\\)inhelppage)

Table 25 NCmDPCs-vs-K10mDPCs_ExpDiff.diffexpfilter.xls (Download)

Figure 20 Distribution of differentially expressed miRNA. <

In Figure, the X axis is the log2(Fold change) of DES, the Y axis is the -log10(FDR) of DES, with green

points indicating down-regulation (log2(Fold change)<=-1 and FDR <=0.001) and red points indicating upregulation (log2(Fold change)>=1 and FDR <=0.001).

11 DESs Hierarchical Clustering

BGI performs hierarchical clustering for differentially expressed miRNA using function

pheatmap. For more than two groups, hierarchical clustering of the intersection is

performed, followed by union DESs.Figure22 shows the results of DESs Hierarchical

Clustering.

14/31

第15页

深圳华⼤基因股份有限公司 400-706-6615 ©2017BGIAllRightsReserved.

Figure 22 Hierarchical clustering. <

The X axis represents each pair of differences, and the Y axis represents Differently Expressed Genes

(DEGs). The colors indicate the fold change, with red showing up-regulation, and blue showing downregulation.

To see the full results, download tables from the web link below. For more information on

theresults formatpleaseseeCluster Formatinhelppage.

Table 26 Clustering list of NCmDPCs-vs-K10mDPCs_ExpDiff.union (Download)

12 DESs Target Prediction

BGI uses multiple types of software to find the target gene of DESs.You can find out

moreinformationaboutthis ontheTarget GenePredictioninmethodpage

Toseethefullresults,downloadthetables fromtheweblink below.

Table 27 NCmDPCs-vs-K10mDPCs_ExpDiff.target.xls (Download)

Formoreinformationontheresults format pleaseseeFile Format Of Target Predictionin

thehelppage.

13 Gene Ontology Enrichment Analysis Of DESs Target

GeneOntology (GO) enrichment analysis is performed for screened DESs target

genes (for more information seeGene Ontology Enrichment Analysis on the Method

page).You can see the fullresults by visiting theGO Enrichment Analysis Resultpage in

Help.

We use WEGO software

[5]to perform GO function classification to show the distribution

of gene function (seeFigure23). The GO enrichment analysis result is shown as

Figure24. You can find out more details about GO enrichment analysis by visiting

GOView

15/31

第16页

深圳华⼤基因股份有限公司 400-706-6615 ©2017BGIAllRightsReserved.

Figure 23 GO functional classification. <

The X axis shows the number of DEGs (their square root value), and the Y axis shows GO terms. All GO

terms are grouped in to three ontologies: blue indicates biological process, brown indicates cellular

components and red indicates molecular function.

Figure 24 GO functional enrichment.

BGI uses the Directed Acyclic Graph (DAG) to visualize GO enrichment results. Each node shows the

name of the GO term and P-value. The darker the red color the lower the P-value, which indicates more

significant enrichment.

14 Pathway Enrichment Analysis Of DESs Target

Apathway enrichment analysis ofDESs target can help us to identify how genes interact

andperformroles inbiological functions.

BGI performs pathway enrichment analysis of DESs target genes using the KEGG

database

[6]. We generate a report for target genes in each pair of DESs. For more

Confirm Show All

NCmDPCs-vsK10mDPCs_ExpDiff.C

NCmDPCs-vsK10mDPCs_ExpDiff.F

NCmDPCs-vsK10mDPCs_ExpDiff.P

16/31

第17页

深圳华⼤基因股份有限公司 400-706-6615 ©2017BGIAllRightsReserved.

information on this, see the sectionPathway Enrichment Analysisin Method page, and

thesectionPathwayEnrichment Analysis ResultinHelppage.

In addition,BGI generates a scatter plot of theKEGG enrichmentresults (seeFigure25),

andabarplot of theKEGGterms (seeFigure26).You can download the fullresults forthe

pathways usingthis link:

NCmDPCs-vs-K10mDPCs_ExpDiff

Figure 25 Statistics of pathway enrichment in each pair. <

The top 20 enriched pathway terms are displayed as a scatterplot in Figure25. The Rich Factor is the ratio of

DESs target gene numbers annotated in this pathway term to all gene numbers annotated in this pathway

term. The greater the Rich Factor the greater the degree of enrichment. The Q-value is the corrected Pvalue, and ranges from 0~1; the lower the Q-value the greater the level of enrichment.

Figure 26 KEGG classification for each pair. <

The X axis shows the number of DESs target genes, and the Y axis shows the second KEGG pathway

terms. The first pathway terms are indicated using different colors. And second pathway terms are sub- 17/31

第18页

深圳华⼤基因股份有限公司 400-706-6615 ©2017BGIAllRightsReserved.

groups of the first pathway terms and are grouped together on the X axis on the right side.

Methods

1 Experiment Pipeline Steps for Small RNA Sequencing

1)Small RNAenrichmentandpurification.

2) 3' end adaptor ligation: Ligate the 5-adenylated and 3-blocked adaptor to the 3' end of

thesmall RNAfragment.

3)AddUniquemolecularidentifiers (UMI)labeledPrimer.

4)5'endadaptorligation.

5)First strandsynthesiswithUniquemolecularidentifiers (UMI)labeledPrimer.

6)Secondstrandsynthesis.

7)Fragment selection.

8)Library quantitativeandpoolingcyclization.

9)LibraryQC.

10)SequencingonDNBSEQ.

Theexperimentpipelineis describedinFigure1.

Figure 1 Experimental workflow of standard DNBSEQ UMI small RNA library

2 Bioinformatics Pipeline

18/31

第19页

深圳华⼤基因股份有限公司 400-706-6615 ©2017BGIAllRightsReserved.

Figure 2 Bioinformatics analysis pipeline for Small-RNA project

BioinformaticsAnalysisPipelineforSmallRNA Sequencing

1.Eliminatethelow-quality reads,adaptors andother contaminants toget cleanreads

2.Summarize the length distribution of the clean tags, common and specific sequences

betweensamples

3.Annotatethecleantags intodifferent categories

4.Predictthenovel miRNA

5.Functionannotationof knownmiRNAs

3 Data Filtering

The impurities of raw data are: 5' primer contaminants, no-insert tags, oversized

insertion tags, low quality tags, poly A tags and small tags, tags without 3' primer.

Generally, an adaptor contaminant is caused by the sample quality, the adaptor or

sample concentration. The higher the adapter proportion is, the more the contaminant

there is. Low quality tags also need to be filtered; low quality tags are those which have

more than four bases and whose quality is less than ten, or those which have more than

six bases andhaveaquality less thanthirteen.19/31

第20页

深圳华⼤基因股份有限公司 400-706-6615 ©2017BGIAllRightsReserved.

All these contaminant tags are removed and the length distribution of clean tags is then

summarized. The length distribution analysis is helpful to find the compositions of the

sample. Normally, the length distribution of small RNA is between 18nt and 30nt. For

example,miRNAis normally 21ntor22nt, siRNAis 24nt,andpiRNAis 30nt.

Thedataisprocessedusingthefollowingsteps:

1)Removelow quality tags

2)Removetagswith5'primer contaminants

3)Removetagswithout3'primer

4)Removetagswithoutinsertion

5)RemovetagswithpolyA

6)Removetagsshorterthan18nt

7)Summarizethelengthdistributionofthecleantags

After filtering, the remaining tags are called 'clean tags' and stored in FASTQ format

[7]

(more information aboutFASTQ Format can be found on the Additional Information

page).

4 Reads Mapping

In general, the higherthe alignmentratio, the closerthe genetic relationship between the

sample and the reference species. The lower rate may be due to low similarity with the

referencegenomeorduetocontaminants.

Bowtie2

[8] is used to map clean reads to the reference genome and to other sRNA

databases. Please note that for Rfam we use cmsearch

[9].Their default alignment

parameters areas follows:

Bowtie2 : -q -L 16 --phred64 -p 6

cmsearch: --cpu 6 --noali

To learn more about Bowtie2 parameters, please refer to the Options section on

http://computing.bio.cam.ac.uk/local/doc/bowtie2.html#.

5 sRNA Classification

Intheannotationinformationof differentRNAs, somesmall RNAtags may bemappedto

more than one category. To make sure every unique small RNA is mapped to only one

category,wefollow this priority rule:

MiRbase> pirnabank> snoRNA(human/plant)> Rfam> other sRNA

6 sRNA Prediction

We use miRDeep2

[10](for animals) and miRA[11](for plants) to predict novel miRNA by

exploringthecharacteristic hairpinstructureofmiRNAprecursor. 20/31

第21页

深圳华⼤基因股份有限公司 400-706-6615 ©2017BGIAllRightsReserved.

Piano

[12] is used to predict piRNAs. This algorithm is based on the Support Vector

Machine (SVM) algorithm and transposon interaction information.The SVM classifier

canbeusedinawiderangeof species includinghuman,mouse,rat,fruitfly andinsects.

Small interfering RNA (siRNA[13])is a 22-24nt double-strand RNA, each strand of which

is 2nt longer than the other. Due to this structural feature, we align tags to each other to

find sRNAs meeting these criteria. These resultant tags are most likely to be siRNA

candidates.

7 sRNA Expression

Unique molecular identifier (UMI)

[14], 8-10nt seqences. It was connected to cDNA

molecules to marking each molecule in the original sample at the early stage of library

construction. It was used to reduce the quantitative bias introduced by PCR

amplification, and is conducive to obtaining enough readings for testing. We calculate

thespecies numbers toaccuratequantitativeof sRNAs.

8 Target Gene Prediction

In order to find more accurate targets multiple types of software are used. Generally, we

use RNAhybrid

[15],miRanda

[16] or TargetScan

[17] to predict animal targets, and use

psRobot

[18],TAPIR[20]orTargetFinder

[19]topredict plant targets.Thedefault parameters

areas follows:

miRanda: -en -20 -strict

RNAhybrid: -b 100 -c -f 2,8 -m 100000 -v 3 -u 3 -e -20 -p 1 -s 3utr_human

psRobot: -gl 17 -p 8 -gn 1

TargetFinder: -c 4

TAPIR: --score 5 --mfe_ratio 0.6

9 Screening DESs (Possion Distribution)

Using the findings of the Genome Res paper entitled the significance of digital gene

expression profiles

[21], BGI have developed an algorithm to identify differentially

expressedgenes betweentwosamples.

If x is defined as the number of reads from small RNA A, x yields to the Poisson

distribution:

In orderto calculate the expression level of the genes we use a different equation. In this

equation,thenumberoftotal cleanreads mappedtothereferencegenomeof sample1is

N1

, and the number of total clean reads mapped to the reference genome of sample 2 is

N2

; small RNA A holds x tags in sample 1 and y tags in sample 2. The probability of small

RNAAexpressedequally betweentwosamples canbecalculatedas follows:

21/31

第22页

深圳华⼤基因股份有限公司 400-706-6615 ©2017BGIAllRightsReserved.

In the equation above, the P-value of the differential gene expression test is corrected

using the Bonferroni method

[22]. DES analysis is then performed on the sample;

however, DES analysis creates a problem as it generates thousands of hypothesis

simultaneously (only if gene x is differentially expressed between the two groups),

correction for false positive (type I errors) and false negative (type II) errors are

performedusingtheFalseDiscoveryRate(FDR )method

[23].

Inthenext step, it is assumedthatwehavepickedoutR differentially expressedgenes in

whichSgenes show differential expression,andtheotherVgenes arefalsepositives.

The error ratio Q = V / R. The customer sets a cutoff value for Q (e.g. BGI sets a default

cutoff of 5%), FDR is preset to a value less than 0.05. To judge the significance of gene

expression difference, FDR ≤ 0.001 and the absolute value of Log2Ratio ≥ 1' are set as

thedefaultthreshold.

More stringent criteria, with smaller FDR and bigger fold-change value, can be used to

identifyDEGs.

Next, we make multiple hypothesis tests for the P-value of the differential gene

expression test

[22] and determine the P-value field by controlling the FDR result.

Conditions aresetinadvancesothatthe FDR result cannotexceed0.05.

At the same time, we calculated the gene expression level (FPKM value) in order to

calculatethegeneexpressiondifferencebetweendifferent samples.

The smallerthe FDR value, the greaterthe difference multiple, which indicate there are

significantdifferences inexpression.

In BGI’s experience, differentially expressed genes were defaulted as genes with FDR

≤ 0.001andmultiples ofmorethan2-fold.

10 Hierarchical Clustering Analysis

BGI performs hierarchical clustering for differentially expressed miRNA using function

pheatmap. For more than two groups, hierarchical clustering of the intersection is

performed,followedby unionDESs.

11 Gene Ontology Enrichment Analysis

GeneOntology (GO) is an international standard gene functional classification

system. It offers a dynamically updated and controlled vocabulary, as well as a defined

concept to comprehensively describe properties of genes and their products. GO has

three ontologies: molecular function, cellular component and biological process. The

22/31

第23页

深圳华⼤基因股份有限公司 400-706-6615 ©2017BGIAllRightsReserved.

basic unitofGOisGO-term,andeveryGO-terminGObelongs toatypeofontology.

GO enrichment analysis finds all GO-terms that are significantly enriched in a list of

DESs target genes, it also finds the genes that correspond to specific biological

functions.To perform this analysis,BGI first maps all genes toGO-terms in the database

(http://www.geneontology.org/), which calculates the gene numbers for every term. The

hypergeometric test is then used to find significantly enriched GO-terms in the input

gene list. This test is based on 'GO::TermFinder'

(http://www.yeastgenome.org/help/analyze/go-term-finder).

BGI has developed an algorithm to perform this analysis, and the method used is

describedas follows:

Intheequationabove,N is thenumberofall geneswithGOannotation;nis thenumberof

DESs target genes in N; M is the number of all genes that are annotated to a specific GO

term;mis thenumberofDESs targetgenes inM.

The P-value is corrected by using the Bonferroni method

[22], a corrected P-value ≤ 0.05

is taken as a threshold. GO terms fulfilling this condition are defined as significantly

enrichedGOterms.

12 Pathway Enrichment Analysis

Pathway-based analysis helps to further understand DESs target genes biological

functions. KEGG[24](the major public pathway-related database) is used to perform

pathway enrichment analysis. This analysis identifies significantly enriched metabolic

pathways or signal transduction pathways in DESs target genes when compared with

thewholegenomebackground.

The calculating formula is the same as that in GO analysis. Here N is the number of all

genes with KEGG annotation; n is the number of DESs target genes in N; M is the

number of all genes that are annotated to a specific pathway; m is the number of DESs

target genes in M. The P-value is corrected by using the Bonferroni method

[22], a

corrected P-value <= 0.05 is taken as a threshold. KEGG terms fulfilling this condition

aredefinedas significantly enrichedKEGGterms.

Help

1 Report Introduction

This concludingreport contains eight sections :

Results:Includes project results in a variety of display forms including text, tables and

pictures.

Methods:Included the project workflow, experimental processes, bioinformatics

analysis. 23/31

第24页

深圳华⼤基因股份有限公司 400-706-6615 ©2017BGIAllRightsReserved.

Tables:Includes statistics tables displayingresults.

Figures:Includes statistics figures oftheresults.

Files:This where projectresults are stored in downloadable formats.Please referto`The

Guide`tolearnhow todownloadfiles.

Reference:This section show small RNA project analysis methods and references

articles.

Additional Information:Includes aguidetohelpyouininterprettheresults.

FAQ:This section answers some common questions, such as those relating to the data

interpretationprocess andsampleselectionanddelivery.

Thevideoforresultinterpretionas follows:

report: https://pan.genomics.cn/ucdisk/s/73UZjq

data: https://pan.genomics.cn/ucdisk/s/2QjYBb

2 FASTQ Format

The original image data is transferred into sequence data via basecalling , which is

defined as raw data or raw reads and saved as an FASTQ file. The FASTQ files are

theoriginal data provided by BGI to the users, and include detailed read sequences and

read quality information. In each FASTQ file, each read is described by four lines, listed

as follows:

@A80GVTABXX:4:1:2587:1979#ACAGTGAT/1

NTTTGATATGTGTGAGGACGTCTGCAGCGTCACCTTTATCGGCCATGGT

+

BMMTKZXUUUdddddddddddddddddddddddddddadddddd^WYYU

The first and third lines are sequence names generated by the sequence analyzer; the

second line is the sequence; the fourth line is the sequencingquality value, in

whicheach letter corresponds to the bases in line 2; the base quality is equal to theASCII

value of the characterin line 4 minus 64, e.g. theASCII value of c is 99, therefore its base

quality valueis 35.

Table1 demonstrates the relationship between sequencingerror and

sequencingquality . Specifically, if the sequencingerror rate is denoted as E and

base quality value is denoted as SQ, the relationship is described using the following

formulas:

T a b l e 1 Relationship between sequencing error rate and sequencing quality

24/31

第25页

深圳华⼤基因股份有限公司 400-706-6615 ©2017BGIAllRightsReserved.

value (Download)

Sequencing Error Rate(%) Sequencing Quality Value Character

1.00 20 T

0.10 30 ^

0.01 40 h

3 BAM Format

Mapping results are stored in a BAM file, which is the binary equivalent of a SAM file.

SAMtools software can perform format conversion between BAM and SAM, and also

support more complex tasks like variant calling and alignment viewing as well as sorting,

indexinganddataextraction.Youcangetmoredetails aboutSAMtools here

http://samtools.sourceforge.net/samtools.shtml#5.

If required by clients, BAM files of the genome mapping result can be provided. BGI also

recommends using IGV tool (Integrative Genomics Viewer) to visualize BAM files. IGV

supports the loading of multiple samples in order to compare in the same scale, and can

view the distribution of reads on the Exon, Intron, UTR, and Intergenic regions;which

makes it very convenientandintuitive.

You can read an introduction to IGV that in the project result, and more detailed

information about IGV is available on this website

http://www.broadinstitute.org/software/igv/UserGuide.Figure2 shows an example using

Windows operatingsystem.

Figure 2 A screenshot of IGV interface. <

This example loads two samples into IGV tool in window operation system.

4 File Format Of Genome Statistics

Thegenomestatistics result of eachsamplearestoredinatab-separatedtext filenamed

`samplename`followedby .genome_stat.xls.

Table2describes thefileformatforgenomestatistics. 25/31

第26页

深圳华⼤基因股份有限公司 400-706-6615 ©2017BGIAllRightsReserved.

Table 2 Format description of genome statistics (Download)

Field Description

Flag Chomosome name

Count Mapped tags number

Percentage Percentage of mapped tags

Alias Alias for chomosome, which was used in figure (*.genome_stat.png)

5 File Format Of sRNA Catalog

sRNA catalog result of each sample are stored in a tab-separated text file named

`samplename`followedby .catalog.xls.

Table3describes thefileformatfor sRNAcatalog.

Table 3 Format description of sRNA catalog (Download)

Field Description

sRNA id sRNA name

Count Mapped tags number

Type sRNA type

Description Description for sRNA

6 File Format Of Unknown sRNA

UnknownsRNAresultis storedintab-seperatedtextfilenamedunknown.xls.

Table4describes thefileformatforunknownsRNA.

Table 4 Format description of Unknown sRNA (Download)

Field Description

Tag Tag name

Chr Chromosome name

Start Start position

End End position

Strand Strand

* Count for each sample

Sequence Tag sequence

7 File Format Of Novel miRNA

Novel miRNA result of each sample is stored in a tab-separated text file named

predicted_miRNA.xls.

Table5describes thefileformatforNovel miRNA.

Table 5 Format description of Novel miRNA (Download)

Field Description

miRNA id miRNA name

Chromosome Chomosome name

Strand Strand of miRNA

Sequence(mature) Mature sequence of miRNA

Sequence(star) Star sequence of miRNA

26/31

第27页

深圳华⼤基因股份有限公司 400-706-6615 ©2017BGIAllRightsReserved.

Start Start position

End End position

Sequence(precursor) Precursor sequence of miRNA

Tag info Mapped tag infomation

8 File Format Of Novel piRNA

Novel siRNA result of each sample is stored in a tab-separated text file named

predicted_piRNA.xls.

Table6describes thefileformatforNovel piRNA.

Table 6 Format description of Novel piRNA (Download)

Field Description

piRNA id miRNA name

Chromosome Chomosome name

Repeat family Repeat family

Strand Strand of miRNA

Start Start position

End End position

Sequence piRNA sequence

Tag info Mapped tag infomation

9 File Format Of Novel siRNA

Novel siRNA result of each sample is stored in a tab-separated text file named

predicted_siRNA.xls.

Table7describes thefileformatforNovel siRNA.

Table 7 Format description of Novel siRNA (Download)

Field Description

siRNA id siRNA name

Guide sequence Candidate siRNA sequence

Passenger sequence Passenger sequence

Tag info Mapped tag infomation

10 File Format Of sRNA Expression

sRNA Expression result is stored in a tab-separated text file named`sample name`

followedby .expr.xls.

Table8describes thefileformatfor sRNAexpression.

Table 8 Format description of sRNA expression (Download)

Field Description

siRNA id miRNA name

Count Mapped tag number, the value in parentheses is the total number

Type Small rna type

Description Description for small rna

UMI The real expression

11 File Format Of Target Prediction

27/31

第28页

深圳华⼤基因股份有限公司 400-706-6615 ©2017BGIAllRightsReserved.

The Target Prediction result for each sample is stored in a tab-separated text file named

target.filter.xls .

Table9describes thefileformatforTargetPrediction.

Table 9 Format description of Target Prediction (Download)

Field Description

miRNA id miRNA name

Target id Target name

TargetScan site type Type of TargetScan

TargetScan NA meaningless

RNAhybrid MFE MFE

RNAhybrid pvalue Predicted p-value

miRanda MFE MFE of miRanda

miRanda score Score value of miRanda

psRobot score Score value of psRobot

psRobot NA meaningless

TargetFinder score Score value of TargetFinder

TargetFinder NA meaningless

12 DESs screening Format (Possion Distribution)

The differentially expressed sRNA result, which was screened using the Poisson

Distribution method in each control-treatment pair, is stored in atab-separated text file

named`pair`followedby _ExpDiff.diffexpfilter.xls.

Table10describes thefileformatforDESs screeningformat(PoissonDistribution).

Table 10 Format description of DESs screening result file (Download)

Field Description

miRNA id miRNA name

Count(A) Tag number of sample A

Count(B) Tag number of sample B

TPM(A) TPM value of sample A

TPM(B) TPM value of sample B

log2 Ratio(B/A) Log2 ratio

Up-Down-Regulation(B/A) Up(down) regulated

P-value P value

FDR FDR

13 Cluster Format

The Cluster result is stored in a tab-separated text file named `pair` followed by

inter/union.xls

Table11describes thefileformatforCluster.

Table 11 Format description of cluster result file (Download)

Field Description

miRNA miRNA name

compare Log2 ratio28/31

第29页

深圳华⼤基因股份有限公司 400-706-6615 ©2017BGIAllRightsReserved.

14 GO Enrichment Analysis Result

Go Enrichment analysis results are supplied as GOView.html files. You can open these

files usingyourweb-browser.

On yourresults file, the left navigation displays three types of GO terms for each controltreatmentpair(C: cellular component,P:biological process&F:molecularfunction).

OneexampleofthethreeenrichedGOterms results is showninFigure3.

Figure 3 Significantly enriched GO terms. <

Column 1 shows the GO term name, Column 2 the ratio of DESs target genes enriched to this GO term,

Column 3 the ratio of genes enriched to this GO term and Column4 is the Corrected P-value which

indicates the degree of enrichment. The smaller the Corrected P-value, the more significantly enriched the

DESs target gene.

Theresults listhas beensortedbyCorrectedP-value.

Click the term name `intermediate filament` inFigure3to find out more about genes that

enriched to it.More information about intermediate filament is available on

http://amigo.geneontology.org/amigo.

The full list of genes that enriched to this GO term can be seen by clicking `view genes`in

Figure3.Anexampleofwhat youwill seeis showninFigure4.

Figure 4 Gene list related to GO terms. <

As shown in Figure4, 48 genes are enriched to the GO term ` intermediate filament `.

15 Pathway Enrichment Analysis Result

The pathway enrichment result and the enriched KEGG pathways will be listed as

showninFigure5.

Figure 5 Pathway enrichment analysis of DESs target. <

In Figure 5, Column 3 is the ratio of DEGs enriched to the pathway (Column 2), Column 4 is the ratio of

genes enriched to this pathway.P-value and Q-value are both values that indicate the degree of

enrichment; Q-value is created by correcting the P-value by use of the FDR method. The smaller the Pvalue or Q-value, the more significantly DEGs is enriched to this pathway.The results list has been sorted

by Q-value.The last column is the pathway ID which corresponds to the pathway name. 29/31

第30页

深圳华⼤基因股份有限公司 400-706-6615 ©2017BGIAllRightsReserved.

For example, to view the gene ID list of `Glycosaminoglycan biosynthesis`, click the

pathway name `Glycosaminoglycan biosynthesis` shown inFigure5.This will give you

thegeneIDs thatenrichedtoit(as showninFigure6).

Figure 6 Gene list related to pathway. <

There are 23 DEGs enriched to the pathway ` Glycosaminoglycan biosynthesis `.

Also, the pathway links inFigure6allows you to see detailed pathway information in the

KEGGdatabase.Forexample, clickingthehyperlink on

` Glycosaminoglycan biosynthesis ` inFigure6will show you detailed information on the

KEGGpathway forGlycosaminoglycanbiosynthesis (as showninFigure7).

Figure 7 An example of KEGG pathway of `Glycosaminoglycan biosynthesis`. <

Up-regulated genes are marked with red borders and down-regulated genes with green borders. Nonchange genes are marked with black borders. Related DEGs appear ifyou hover your mouse where the

border with the red or green meet. Clicking on a gene name will redirect you to the KEGGwebsite.

FAQs

Can the research be carried out on species without a reference genome?

It is better to provide the genome reference, or another closely related species genome, or EST sequence. The exon/intron,

repeat information and gene encode sequence are also necessary.

What is the identified method of miRNA?

Any of these methods can be used: PCR-Stem-loop quantitative real-time PCR, Quantitative PCR (qPCR), qRT-PCR, as well

as other methods.

Is it feasible to perform a joint analysis between small RNA analysis and other RNA product analysis?

Yes, not only can it be combined with the analysis of mRNA results, but it also can be combined with circular RNA analysis.

How do I read '.fq' files ?

if you are using Linux or Unix environment, '.fq' files are opened by typing the command 'less'. For Windows environment, you

need to unpack the files first, then use software UltraEdit to view the '.fq' files.

In the Gene Expression Difference Analysis, for example A-vs-B, what is meant by up-regulated and down-regulated?

30/31

第31页

深圳华⼤基因股份有限公司 400-706-6615 ©2017BGIAllRightsReserved.

A-vs-B means sample A is the control and sample B is the case. In the corresponding files, A-vs-B.diffexp.xls and A-vsB.diffexpfilter.xls, if a gene is up-regulated it means the expression of this gene is up-regulated in sample B compared to

sample A. Likewise the same applies for down-regulation.

In pathway enrichment analysis (Figure 13), why are the number of genes not equal to the coloured borders?

In Figure 13, each border represents one kind of enzyme, and the function of an enzyme involves the participation of several

genes, so one border may relate to multiple genes.

References

[1] Wang Z., et al. (2009). RNA-Seq: a revolutionary tool for Transcriptomics. Nature Reviews Genetics, 10(1): 57-63.

[2] Mortazavi, A., et al. (2008). Mapping and quantifying mammalian transcriptomes by RNA-Seq. Nature Methods, 5(7):621-8.

[3] Kozomara A, Griffiths-Jones S. miRBase: annotating high confidence microRNAs using deep sequencing data. NAR 2014 42:D68-D73

[4] Rfam 12.0: updates to the RNA families database. Eric P. Nawrocki, Sarah W. Burge, et al. Nucleic Acids Research (2014)

[5] Ye, J., et al. (2006). WEGO: a web tool for plotting GO annotations. Nucleic Acids Res, 34(Web Server issue): W293-7.

[6] Kanehisa, M., et al. (2008). KEGG for linking genomes to life and the environment. Nucleic Acids Res, 36 (Database issue): D480-4.

[7] Cock P., et al.(2010). The Sanger FASTQ file format for sequences with quality scores, and the Solexa/Illumina FASTQ variants. Nucleic Acids

Research, 38(6): 1767-1771.

[8] Langmead B, et al. (2009). Ultrafast and memory-efficient alignment of short DNA sequences to the human genome. Genome Biology, 10(3): 25-

34.

[9] E. P. Nawrocki and S. R. Eddy, Infernal 1.1: 100-fold faster RNA homology searches , Bioinformatics 29:2933-2935 (2013).

[10] Friedlander, M.R., Chen, W., et al. 'Discovering microRNAs from deep sequencing data using miRDeep', Nature Biotechnology, 26, 407-415

(2008)

[11] Evers M, Huttner M, et al. miRA: adaptable novel miRNA identification in plants using small RNA sequencing data. BMC Bioinformatics. 2015

Nov 5; 16:370

[12] Kai Wang, Chun Liang, et al. Prediction of piRNAs using transposon interaction and a support vector machine. BMC Bioinformatics201415:419

[13] Jagla B, Aulner N, Kelly PD, Song D, Volchuk A, et al. (2005) Sequence characteristics of functional siRNAs. RNA 11: 864-872

[14] Kivioja T, V盲h盲rautio A, Karlsson K, et al. Counting absolute numbers of molecules using unique molecular identifiers[J]. Nature Methods,

2012, 9(1):72-74.

[15] Rehmsmeier, Marc and Krueger, Jan RNAhybrid: microRNA target prediction easy, fast and flexible, Nucleic Acids Res., 2006

[16] John B, Enright AJ, Aravin A, Tuschl T, Sander C, Marks DS., PLoS Biol. 2005 Jul;3(7):e264.

[17] Agarwal V, Bell GW, Nam J, Bartel DP. Predicting effective microRNA target sites in mammalian mRNAs. eLife, 4:e05005, (2015)

[18] Wu HJ, Ma YK, Chen T, Wang M, Wang XJ. (2012) PsRobot: a web-based plant small RNA meta-analysis toolbox. Nucleic Acids Res.

DOI:10.1093/nar/gks554.

[19] Fahlgren N, Carrington JC (2010) miRNA Target Prediction in Plants. Methods in molecular biology (Clifton, NJ) 592.

[20] Bonnet, E., He, Y., Billiau, K., Van de Peer, Y. (2010) TAPIR, a web server for the prediction of plant microRNA targets, including target

mimics. Bioinformatics 26(12):1566-8.

[21] Audic, S. and J. M. Claverie. (1997). The significance of digital gene expression profiles. Genome Res, 10: 986-95.

[22] Abdi, H.The bonferroni and Sidak corrections for multiple comparisons. In N.J. Salkind ( ed.). Encyclopedia of Measurement and Statistics.

[23] Benjamini, Y. and D. Yekutieli. (2001). The control of the false discovery rate in multiple testing under dependency. The Annals of Statistics,

29: 1165-1188.

[24] Kanehisa, M., et al. (2008). KEGG for linking genomes to life and the environment. Nucleic Acids Res, 36 (Database issue): D480-4.

2021 Copyright BGI All Rights Reserved 粤ICP备 12059600

Technical Support E-mail:info@bgitechsolutions.com

Website: www.bgitechsolutions.com

31/31

百万用户使用云展网进行电子翻页书的制作,只要您有文档,即可一键上传,自动生成链接和二维码(独立电子书),支持分享到微信和网站!
收藏
转发
下载
免费制作
其他案例
更多案例
免费制作
x
{{item.desc}}
下载
{{item.title}}
{{toast}}