Background genes for enrichment analysis
When computing GO categories enrichment it makes sense to narrow down the list of genes the signature is analyzed against (see computation footnote to see how background is used). GOnet allows to use three types of background option:
- All GO annotated genes. All genes annotated with GO terms will be used as a background. This option can be used as a simple default but the results can be not specific enough.
- Custom gene list. Submit a plain text file containing background genes to use. Genes should be one per line; same ID types as for the submitted signature input list are accepted.
- Predefined backgrounds. One of the appproaches to define a background is to use genes significantly expressed in the cell type which gene signature originates from. GOnet attempts to implement this step for the user by allowing to select one of the predefined backgrounds. In this case additional selection option will allow to choose a cell type/tissue to restrict background to. For example, if option (DICE-DB) TH1is selected then all the genes with expression higher than 1 TPM in Th1 cells will be used as a backgrond. If optionAny DICE-DB celltypeis selected then background will consist of genes with expression higher than 1 TPM in at least one of cell types presented in DICE-DB (which are major blood cell types).
Computing GO enrichment p-values
Computation of enrichment p-values follows the procedure in Python goenrich package. For every GO term considered, the p-value in Fisher exact test is computed. For every term null hypothesis states that the number of genes in the input list annotated with the GO term is not overrepresented compared to the background. The contingency table considered is:
Entries in background and in input list | Entries in background but not in input list | Total | |
Annotated with GO term | x | n-x | n |
Not annotated with GO term | N-x | M-N-(n-x) | M-n |
Total | N | M-N |
A p-value is computed as a survival function of hypergeometric distribution with shape parameters (M, n, N) at point x.