Using AnalyzerPro in the Metabolomics Workflow
Figure 1. High level workflow diagram
Setting Up a Processing Method
The processing method is where you determine your peak detection parameters. Your method won’t be complete until you’ve created an RI calculation file (if required) and created a library to match against but you will need to process a data file or two to set these up so follow the instructions below to process a file and then follow Calculate a Retention Index for each file using a method and Automatically Build Target Library to complete your untargeted metabolomics data workflow method.
To create a new method, select Method | New and then to edit select Method | Edit.
Define processing constraints such as the Minimum Masses required to define a peak, the Mass Range to process, the retention time (RT) window to process and any masses to reject (Reject Mass(es)).
Define the detection parameters. This includes Area Threshold, Height Threshold, Width Threshold, Signal to Noise, Scan Windows, chromatographic Resolution and Gaussian Smoothing.
If you have accurate mass data, check the Accurate Mass Processing check box and enter the actual Precision (ppm) of your data and select the Linearity of that precision. You also need to set the precision in your preferences so that displayed spectra reports m/z to the required decimal places.
Select the Enable target component searching? check box. Any one of the available target component libraries can be selected from the list box. Found components will be searched against the selected library and the results will be available in the Target Component Report. Select the Enable library searching? check box if you want to enable this function. Found components will be searched against the selected library or libraries such as NIST.
Select the type, if any, of retention index processing that is required from the drop down list. Retention indices is covered in more detail in the following section.
To save these settings to the method file select Method | Save or Method | Save As. Unless these changes are saved to a method file, you may lose them.
Calculate a Retention Index
Retention indices such as Kovats for organic compounds are a standard used to calculate a ‘retention index’ for a compound relative to, in the case of Kovats, hydrocarbons. For example, n-dodecane (C12) has a retention index of 1200 and C15 has a retention index of 1500. Compounds eluting between C12-15 therefore have retention indices of 1200-1500. Retention indices are useful because they are constant regardless of temperature program and column properties such as length and diameter so compounds analysed under different conditions can be matched to a library using RI rather than retention time.
There are three ways in AnalyzerPro to calculate a RI ladder..
2. Using a method and a single data file and
3. Using a method on each data file.
In metabolomics experiments, especially LC-MS and large-scale GC-MS (where column maintenance is performed) where retention time shifts occur, using the ‘calculate a retention index for each file using a method’ is the best option.
Choose an appropriate data file to build your RI method (.rim file). This can be any file from the experiment acquisition which contains the retention index compounds. A mid-run pooled QC sample would be a good choice if available. Process the file using appropriate peak detection parameters and for speed, turn off library searching.
Once the file is processed find your retention index compounds and record their retention times. For ease, extract masses of interest (right click chromatogram and choose Options | Trace Type | Mass Chromatogram) i.e. for n-alkanes extract mass 57, 71, 85 etc. Then select Home | Results | Retention Index to create your method (.rim file)
Delete components you do not wish to use for RI calculation. Using retention time, identify your RI compounds and give them the appropriate name and retention index i.e. C12 or n-Dodecane; RI 1200. You will need to choose a retention time window based on your data. For large metabolomics studies, there is the possibility of RT shift/drift even for GC-MS. Try to quantify the RT shift before choosing the window. You will also need to include a relative intensity window. By selecting relative intensity of 50%, only those ions which have a relative intensity of 50% or greater will be used for matching. If there is any co elution or difficulty deconvoluting RI components, by using a relative intensity of 50%, ions which are ‘impurities’ will not be included in the matching process allowing a better match with each file.
It’s a good idea to re-analyze the sample or a couple of samples to make sure that your RIs are correct. Since you may like to use a method with RI to match spectra rather than RT, it’s important to get it right or you won’t get any library hits. To do this, view the summary report by selecting Report | Summary.
Now you can edit your processing method to include your retention index calculation method.
Automatically Build Target Component Library
Your data set can be completely defined by the component library generated during the automated library building stage. Following deconvolution, each component found can be assigned an identity. This identity does not have to be absolute and subsequent analysis can be performed on identified components or by treating each component as an unknown. Components can be identified automatically by matching to public libraries or from a specific in-house library of authentic standards. Absolute identity of all components is not required at this stage but where a component of interest is found, that component should be investigated further to confirm its identity.
Libraries are best built using QC samples but they can be built using your entire data set, too. If you have used internal standards in your analysis, you can find them in the file you processed to create your .rim file. You can:
1. Create library from a single file
2. Automatically build library from a sequence of files
You may like to find your internal standards in your data and add these to a new library first. You can then append this library with the new components from 1) your single file or 2) your sequence. If you have QC files it’s not necessary to do this step because you will need to re-process anyway but if you don’t have QC files, and want to process your sequence at the same time you build your library, find your internal standard first so that you can normalize immediately.
When you are ready to build your library, go to Home | Sequence | Analyze. Specify the name and output location of your sequence and then check the ‘automatically build target library’ button.
Processing a Sequence
Figure 2. Processing a sequence in AnalyzerPro
View Sequence Results
Large metabolomics data sets can be modelled using principal component analysis to visualize which metabolites [components] contribute most to the variance observed. It also allows the identification of outliers that are likely to contribute to the skewing of the variance of a particular component. PCA points you in the direction needed for interpretation, but is only one part of the data analysis workflow. Fold-changes and p-values can be calculated in AnalyzerPro to aid in determining key metabolites of interest that may help to answer the biological question.
Once your sequence has finished processing, click MatrixAnalyzer on the results section of the Home tab. Here you will be able to view your PCA (if desired; with or without log transformation), results and normalized results. You can edit the class information post processing to generate a PCA (Home | Sequence | Class Information). Press the options button (left button on the tool bar) to select which information to include in your matrix and which data transformation to use to generate your PCA.
As you scroll through components, you can inspect RT shift in your data and any main class effects for further interpretation.
As an example, the below PCA represents 12 derivatization batches for a mixture of metabolite standards. The PCA shows that batch 4 is clearly distinct from the other batches but with a total explained variance of 9.5% for PCs 1 and 2, the batches appear fairly similar.
Figure 3. PCA Scores from results with log transformation
If we investigate the loadings plot, components on the left will have a higher relative concentration in batches 11 and 12 and components on the right will be higher in batch 4.
Figure 4. PCA Loadings from results with log transformation
By clicking on points in the loadings plot, the bar graph, which is color coded by class, will update to show the relative concentration of the component in each sample. Below we can see that component ‘QC9_030_22’ is missing from batch 4 which is contributing to the separation of batch 4 from the rest of the QC samples.
Figure 5. Areas for component
Using the same data and re-classifying the samples to their run order and removing (by selecting the row trash can button) features which occur in less than 50% of the data set, we can see a trend based on where in the analysis the sample was run. By using the scores plot and investigating the components, it can be seen that components like tentatively identified 2-piperidinecarboxylic acid have a higher relative concentration by the end of the derivatization batch.
Figure 6. PCA Scores
Figure 7. Areas for component 2-Piperidinecarboxylic acid, 1-(trimethylsilyl)-, trimethylsilyl ester
If you defined classes in your sequence, fold changes can be calculated and displayed in the MatrixAnalyzer Results Viewer. The fold change value represents the average peak area response of one class over another and is a useful measure to determine changes in compounds as a result of an experimental condition. The MatrixAnalyzer results can be sorted according to fold change.
By specifying your classes, you will also be able to view each component by experimental group plotted as the mean ± standard deviation. Group means are color coded consistent with the PCA plot (according to class). This representation allows clear visualization of a components’ relative concentration, further aiding the user's interpretation of results.
Figure 8. Average areas per class
You can calculate a p-value using, if appropriate, a one-way ANOVA. The p-value, used in hypothesis testing, indicates that there is a statistically significant difference between one group mean when compared to another. A significant result (usually if the p-value is less than 0.05) indicates that the probability of the null hypothesis being accepted is very low (5%) and the alternative (test) hypothesis can be accepted. As with fold change, results can be sorted by p-value.
AnalyzerPro will also generate a volcano plot to visualize all components plotted against their fold change and p-values. This representation of data allows the user to determine those components with the highest fold changes and lowest p-values (i.e. the most important results) with ease.