Methods
The idea behind LIPEA is to identify specific altered pathways - provided by the KEGG Database - using exclusively lipid compounds. The approach used to this task is the Over Representation Analysis (ORA). ORA starts with considering a list of annotated lipids (e.g. a lipid set related with a signature), then uses the Fisher exact test to verify if the annotations are over represented among a label (pathway) compared to the whole universe of lipids (background), that could be selected as “predefined” for a specific organism (it means, LIPEA will take all the compounds from the pathways related with the selected organism) or be a custom list given by the user.
Procedure
The steps of the algorithm used to implement the ORA are the following.
Step 1
Set an organism, collect the lipid list and the background.
Step 2
Select a pathway to start with.
Step 3
Tally the following 4 numbers: m, N, k, and n, where m is the total number of lipids in the pathway, N is the total number of lipids, k is the number of lipids of the intersection between the lipid list and a pathway, and n is the total number of lipids in the list.
Step 4
Perform a Fisher exact test, with the 4 numbers obtained in the preview step, as follows:
$$ f(k;N,m,n) = \frac{\binom{m}{k} \binom{N - m}{n - k}}{\binom{N}{n}} $$The f value is the probability that this random event could happen under the hypergeometric distribution. In this case, to obtain the p-value to associate to each pathway, the following formula is used:
$$ p = \sum_{l = k}^n f(l;N,m,n) $$Step 5
Go to step 2 for another pathway of interest, until all are tested.
Step 6
Correct the p-values with Benjamini or Bonferroni-Holm corrections.