xstats.enrichment

enrichment is a Python library, part of the xstats toolkit, which you can use to perform to type of enrichment analysis:

you can compare how enriched a subset of objects is in some annotation when compared with a general population. E.g., how significant it is to find 12 women in a group of 20 people knowing that help of the world's population is female? The significance is evaluated using a Fisher's exact test.
you can evaluate how enriched the top of a ranked list of objects is in some annotation. There is no need to apply a cut-off to decide what is the top of the list; the significance of this enrichment is evaluated using methods from [Eden2007a] and [Eden2007b]

A typical use of such library is in bioinformatics, to perform gene set enrichment analysis. Given a set of genes for which a property (such as the expression level) is measured, enrichment can evaluate how enriched is the subset of all genes with expression level above a threshold in some functional annotations. It can also evaluate how enriched the top of a list of genes, ranked by decreasing expression level, is in some functional annotations.

[Eden2007a]

Eden E, Lipson D, Yogev S and Yakhini Z. Motif discovery in ranked lists of DNA sequences. PLoS Computational Biology, 2007 Mar 23;3(3):e39

[Eden2007b]

Eden E. Discovering motifs in ranked lists of DNA sequences. Research thesis, 2007 Jan

Contact Aurelien Mazurie <[email protected]>
Keywords Python, Enrichment analysis, Statistics, Bioinformatic, Fisher's exact test, GOrilla, mHG

Getting started

Download the latest version of the library from http://github/ajmazurie/xstats.enrichment/downloads
Unzip the downloaded file, and cd in the resulting directory
Run python setup.py install. Alternatively, you can package the library by typing python setup.py bdist, which will result in the creation of a file dist/xstats.enrichment-xxx.tar.gz, with 'xxx' being the version number and the name of your platform. Installing the library is then as simple as easy_install dist/Enrichment-xxx.tar.gz (see the setuptools documentation)

From then you only have to import enrichment to use the library:

import xstats.enrichment

# Analysis 1: how significant is it to have 10 objects out of 500
# that share a given annotation, knowing that 120 out of the 1800
# objects in the general population have this annotation?
l, r, t = xstats.enrichment.evaluate_subset(10, 500, 120, 1800)

# the left-tailed probability is the probability of having less
# than 10 objects out of 500 with this annotation:
print "left-tailed:", l # 5.25e-8

# the right-tailed probability is the probability of having more
# than 10 objects out of 500 with this annotation:
print "right-tailed:", r # 0.99

# the two-tailed probability is the probability of observing 10
# objects out of 500 with this annotation, plus the probabilities
# of observing even less likely proportions:
print "two-tailed:", t # 7.78e-8

# as a result, we demonstrate that finding 10 objects with this
# annotation is unexpected, as shown by the two-tailed p-value
# (significant at an error rate of 5%). To know exactly in which
# way the finding is unexpected, just look at the left- and right-
# tailed p-values. In this example the left-tailed p-value is very
# low, while the right-tailed p-value is almost 1. It means that
# finding 10 objects is unexpectedly low in regard of the general
# population.

# conversely, finding 100 objects in the selection of 500 with
# this property is unexpectedly high:
l, r, t = xstats.enrichment.evaluate_subset(100, 500, 120, 1800)

# the right-tailed p-value is very low, while the left-tailed is 1
print "left- and right-tailed:", l, r # 1, 1.32e-39


# Analysis 2: in a ranked list of 1000 objects, of which half of
# them share a given annotation, how significant is it to find 20
# objects with this annotation at the top of the list?

# we build an occurrence vector which, for each object in the list,
# contains either True or False to represent if this object have
# the annotation considered.

# we start with an homogeneous distribution:
occurrences = [True, False] * 500

# as expected, there is no significant enrichment at the top of the list:
print xstats.enrichment.evaluate_list(occurrences) # 0.99

# we now build a second occurrence vector, in which the first 20 objects
# all have the annotations:
occurrences = [True] * 20 + [False] * 20 + [True, False] * 480

# this time we found a significant enrichment at the top of the list,
# which in this case is determined as the 20 first entries:
p_value, pivot = xstats.enrichment.evaluate_list(occurrences, with_pivot = True)

print "p-value:", p_value # 3.67e-5
print "pivot:", pivot # 20

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
doc		doc
lib/xstats		lib/xstats
tools		tools
LICENSE.txt		LICENSE.txt
MANIFEST.in		MANIFEST.in
README.rst		README.rst
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

xstats.enrichment

Getting started

About

Releases

Packages

Languages

License

ajmazurie/xstats.enrichment

Folders and files

Latest commit

History

Repository files navigation

xstats.enrichment

Getting started

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages