How does it works?


The idea behind LIPEA is to identify specific altered pathways - provided by the KEGG Database - using exclusively lipid compounds. The approach used to this task is the Over Representation Analysis (ORA). ORA starts with considering a list of annotated lipids (e.g. a lipid set related with a signature), then uses the Fisher exact test to verify if the annotations are over represented among a label (pathway) compared to the whole universe of lipids (background), that could be selected as “predefined” for a specific organism (it means, LIPEA will take all the compounds from the pathways related with the selected organism) or be a custom list given by the user.


The steps of the algorithm used to implement the ORA are the following.

Step 1

Set an organism, collect the lipid list and the background.

Step 2

Select a pathway to start with.

Step 3

Tally the following 4 numbers: m, N, k, and n, where m is the total number of lipids in the pathway, N is the total number of lipids, k is the number of lipids of the intersection between the lipid list and a pathway, and n is the total number of lipids in the list.

Step 4

Perform a Fisher exact test, with the 4 numbers obtained in the preview step, as follows:

$$ f(k;N,m,n) = \frac{\binom{m}{k} \binom{N - m}{n - k}}{\binom{N}{n}} $$

The f value is the probability that this random event could happen under the hypergeometric distribution. In this case, to obtain the p-value to associate to each pathway, the following formula is used:

$$ p = \sum_{l = k}^n f(l;N,m,n) $$

Step 5

Go to step 2 for another pathway of interest, until all are tested.

Step 6

Correct the p-values with Benjamini or Bonferroni-Holm corrections.