How to submit metabolomic data to MetaboLights

Index

To manage the exponential rise in biological data and the relative metadata information, it has become necessary to submit raw data to ad-hoc public repositories. If this requirement has worked well for genomic and transcriptomic studies, there is less appeal in those studies involving metabolite and metabolomic analysis. The requirement is not fully applied and not even enforced by journals1, thus becoming difficult to retrieve metabolomic raw data for further meta-analysis and comparative studies.

Notwithstanding the complexity in harmonising disparate sampling strategies, metabolite extraction protocols, chromatographic and spectroscopy techniques; and moreover data measurement, data integration and validation, and metabolite annotation, there is an urgent need to make this biological data and information available to the entire scientific plant community2.

MetaboLights3 is an open-access database for metabolomic experiments, connected with their raw data and associated metadata4,5. It is part of the ELIXIR infrastructure, hosted by the European Bioinformatics Institute (EMBL-EBI). MetaboLights is the recommended metabolomic repository for a number of leading journals in the field, such as Metabolomics, Metabolites, Frontiers, BioMed Central, Scientific Data, Plos Biology, EMBO press, Data, F1000Research, OpenResearch etc. 

The availability of both data and metadata is of utmost importance to adhere to FAIR principles6 in order to have data, experiments and results Findable, Accessible, Interoperable and Reusable.  MetaboLights facilitates data submission through the ISA Commons, a growing community that uses the ISA metadata tracking framework to aid standards-compliant collection, curation, management and reuse of datasets in an increasingly diverse set of life science domains.

Exceptions to a rule always exist and with these guidelines, we cannot presume to be exhaustive. Our intention is to provide a guided example on how to submit metabolomic data and metadata, with particular emphasis on LC- and GC-MS techniques, so as to be seen as tutorials for students, researchers and newcomers to the field.

Creating an account on MetaboLights is straightforward and free, requiring few details, such as name, email address, affiliation, country, password and the possibility to link the profile to the researcher’s ORCID ID.

Once the account is opened, the submission of a study requires the availability of raw data produced by the analytical instrument, preferably in an open-source format, and the metadata information to support the study.

Select “Submit study” to create a new study (or edit an existing study), and the Guided Submission Portal will lead the user through the study creation step by step.

Studies must pass validation to be submitted. Validation errors are highlighted in the information bar at the top of the study. Details of errors are available in the study validations tab and some example can be found here

In the initials steps, the study will not be published and made available until the following conditions have been met:

  1. preliminary online validation has been done,
  2. the submitter promotes the study from “submitted” to “in curation”
  3. after a minimum of 4-8 weeks, the curation team approves the study if everything is correct
  4. the release date is reached. 

The word “study” means the entire experiment that can be further subdivided into different assays if multiple analytical techniques were used, such as LC and GC. Besides the title, which is freely added by the submitter, one of the most important fields is the study ID or study unique identifier, an alphanumeric code always composed by the letters MTBLS followed by numbers, which is automatically assigned at the moment of the creation of a new study. Study ID cannot be modified and it is needed when referencing the study in manuscripts or elsewhere, together with the relative URL, like www.ebi.ac.uk/metabolights/MTBLS000.

This section reports general descriptions such as the title of the study, the author(s), an abstract giving a brief overview of the study and a list of keywords for indexing the studies. This can be copied from the publication (if already available) which is possible to link via the DOI of the article. All this information is stored in the file called i_MTBLSxxx.txt together with the descriptive protocols seen below.

The factors considered in the design of the experiment are visualized at the beginning of the submission, and they are added within the sample table: a new column for each factor is created. Increasing the number of factors increases also the metadata associated with the study and therefore the future reusability. A rich set of metadata is indeed key to reusability, also beyond the scope of the original study.

[table “” not found /]


Protocols are essential for reproducibility and should provide a detailed description of steps taken in the study. When an assay is selected, the protocol section will populate with several titled sections to which the user can add the specific study details. Protocols are in a free-text form, but a list of minimum requirements for each protocol section are presented in the tabs below, together with some examples.

In this section, it is important to describe the sample origin (source, organism, species, intraspecific name, organism part), any relevant treatment, time points etc. as well as collection and storage procedures.

[table “” not found /]


This protocol is intended to describe any extraction or preparation methods applied to the sample before analysis. Please also to include information of any control samples prepared for the assay eg. pooled samples, standards, quality control, solvent blank etc.

[table “” not found /]


Examples: 

  1. About 1 g of each sample was then weighed into 15‐ml amber vials, and 50 µl of the internal standard (IS) o‐coumaric acid (2 mg/ml MeOH), 1.2 ml of H2O/CH3OH (1 : 2) and 0.8 ml of CHCl3 were then added. The extraction mixture was vortexed for 1 min, shaken for 15 min at room temperature using an orbital shaker (Grant‐Bio Rotator PTR‐60) and centrifuged at 4 °C and 1000 g for 10 min. The upper aqueous methanolic phase was transferred into a 5‐ml flask, and extraction was repeated by adding another 1.2 ml of H2O/CH3OH (1 : 2) with shaking and centrifugation as before. The two supernatants were collected in the same 5‐ml flask and the flask was brought up to 5 ml using Milli‐Q water then filtered into LC‐MS certificated vials through a 0.2‐µm PTFE filter (Millipore, Italy) and analyzed. This procedure was repeated three times for each biological sample in order to obtain three technical replicates.18 
  2. A sample of 100 mg frozen tissue powder was transferred to 2 ml Eppendorf tube, and metabolites were extracted in 1 ml methanol/chloroform/water extraction solution (2.5/1/1 v/v/v). The mixture was then vortexed for 1 min and centrifuged for 5 min at 10,000 RPM (Sigma, Germany) at 4 °C, and the supernatant was decanted into the new tubes. The supernatant was mixed with 400 µl of chloroform and 400 µl of MilliQ water and then centrifuged at 5 min at 10,000 RPM at 4°C. The upper water/methanol phase was filtered 0.22 µm (Millipore) and transferred to MS vials for LC-MS analysis.19
  3. For the free (non-glycosylated) VOCs, on the day of analysis, four grams of frozen grape powder were weighed out in a 20 mL SPME dark-glass vial. Three grams of NaCl, 15 mg of citric acid, 15 mg of ascorbic acid, 50 μL of sodium azide, and 7 mL of milliQ water were added to the sample. Fifty μL of a solution containing five internal standards, d10-4-methyl-3-penten-2-one (1 g/L), d11-ethyl hexanoate (1 g/L), d16-octanal (1 g/L), d8-acetophenone (1 g/L), d7-benzyl alcohol (1 g/L), was added to each sample.20

For reproducible reasons, there is a need to provide details of the instrument and column used, mobile phase and gradient, and settings such as temperatures, flow rate, injection volume…

[table “” not found /]


Examples: 

  1. A Waters Acquity UPLC controlled by MassLynx 4.1 was used. The column was a reversed-phase (RP) ACQUITY UPLC 1.8 µm 2.1 x 150 mm HSS T3 column (Waters); column manager was set at 40 ºC; the mobile phase flow rate was 0.28 ml/min, and the eluents were water and methanol both with 0.1% formic acid. The multistep linear gradient used was as follows: 0-1 min, 100% A isocratic; 1-3 min, 100-90% A; 3-18 min, 90-60% A; 18-21 min, 60-0% A; 21-25.5 min, 0% A isocratic; 25.5-25.6 min, 0-100% A; 25.6-28 min 100% isocratic. The injection volume was 5 µl and the samples were kept at 4 ºC throughout the analysis. The QC sample injections were used for the initial equilibration of the LC-MS system (5 injections) and controls at regular intervals (one QC sample injection every 6 real sample injections) during the sequence. The samples were analyzed according to a randomized order. 22
  2. GC analysis was performed using a Trace GC Ultra gas chromatograph coupled with a TSQ Quantum Tandem mass spectrometer, upgraded to the XLS configuration. A DuraBrite IRIS ion source with a pre-filter was installed to improve the performance of the spectrometer. The system was equipped with a Triplus autosampler (Thermo Electron Corporation, Waltham, MA). The injection volume was 1 µL, post-injection dwell time 4 s, tray temperature10◦C. GC separation was performed on a 30 m VF-WAXms capillary column with an internal diameter of 0.25 mm and a film thickness of 0.25 µm (Varian, Inc., USA). Temperature programme:40◦C hold for 2 min after injection, 10◦C/min up to 50◦C, 1.4◦C/minup to 60◦C, hold for 2 min, 1.6◦C/min up to 70◦C, hold for 1 min,2.2◦C/min up to 100◦C, hold for 0.5 min, 3.1◦C/min up to 140◦C,4.4◦C/min up to 200◦C, 12◦C/min up to 250◦C, hold for 6 min. Injection parameters were: splitless injection, splitless time: 0.8 min, inlet temperature 250◦C, carrier gas was helium 5.5, programme flow: 0.8 mL/min hold for 62.50 min, 0.8 mL/min up to 1.2 mL/min in 0.5 min, hold for 7 min. 23
  3. Briefly, chromatography was performed in a 1290 Agilent UPLC equipped with an RP C30 3-μm column (250 × 2.1 mm i.d.) coupled to a 20 × 4.6-mm C30 guard column (YMC Inc., Wilmington, NC, USA). A flow rate of 0.21 ml/min and an injection volume of 3 μl were adopted. The mobile phases consisted of methanol (A) and tert-methyl butyl ether (B), both containing 5 % of a mixture of water/methanol (20/80 by volume) and 0.2 % (w/v) ammonium acetate. The gradient elution consisted of 100 % A isocratically for 6 min, a step to 82.5 % A at 7 min, maintained isocratically for 5 min, followed by a linear gradient to 32.5 % A by 30 min, and these conditions were maintained for 14 min. A conditioning phase (48–60 min) was then used to return the column to the initial concentration of A. The DAD signal was acquired from 200 to 600 nm (step 1.2 nm), with a slit width of 1 nm, at a frequency of 2.5 Hz.24

Provide details of the instrument used (make & manufacturer), ion source, ionisation mode (positive/negative), m/z range, and specific parameters such as temperatures, voltages, flow rates, scan rates.

Instrumental performance and method validation

Since the analyzed metabolites are not pre-defined in metabolomics, method validation is rather difficult. However, a minimum reporting of instrumental performance parameters is encouraged. Describe the nature and method(s) used to ensure instrumental sensitivity, selective, linearity, stability, resolution and mass accuracy. The QC samples distribution in a PCA plot is a good indicator.

[table “” not found /]


Examples: 

  1. GC analysis was performed using a Trace GC Ultra gas chromatograph coupled with a TSQ Quantum Tandem mass spectrometer, upgraded to the XLS configuration.The mass spectrometer was operated in electron ionisation (EI) mode at 70 eV. The filament current was 50 μA. The temperature of the transfer line was 220 °C and argon (99.9998% purity) was used as the collision gas with a collision cell pressure of 1.2 mTorr. Dwell time was 0.03 s up to 12 min and 0.05 s further on. The mass spectrometer was tuned and calibrated using FC-43 (perfluorotributylamine (PFTBA)). Data acquisition and analyses were performed using the Xcalibur Workstation software supplied by the manufacturer. 23
  2. Mass spectrometry detection was performed on a Waters Xevo TQMS (Milford, MA, USA) instrument equipped with an electrospray (ESI) source. Capillary voltage was 3.5 kV in positive mode and −2.5 kV in negative mode; the source was kept at 150 °C;desolvation temperature was 500 °C; cone gas flow, 50 L/h; and desolvation gas flow, 800 L/h. Unit resolution was applied to each quadrupole. Flow injections of each individual metabolite were used to optimize the MRM conditions. For the majority of the metabolites, this was done automatically by the Waters Intellistart software, whereas for some compounds the optimal cone voltages and collision energies were identified during collision-induced dissociation (CID) experiments and manually set. A dwell time of at least 25 ms was applied to each MRM transition. 25
  3. The UHPLC system was coupled directly to an API 5500 triple-quadrupole mass spectrometer (Applied Biosystems/MDS Sciex, Toronto, Canada) equipped with a electrospray source. Analyst™ software version 1.6.1 (Applera Corporation, Norwalk, CT, USA) was used for instrument control and data acquisition. The transitions and spectrometric parameters were optimized individually for each standard by direct infusion of their solutions (10 µg mL1) in water/ACN (40:60 v/v) with NH4COOH 10 mM and HCOOH 0.1% into a mass spectrometer at a flow rate of 10 μL min−1. The two most abundant fragments to use as quantifier and qualifier were identified for each compound. Declustering potential (DP) and entrance potential (EP) were optimized for each precursor ion and collision energy (CE) and Collision Cell Exit Potential (CXP) for each product ion. Table 3 shows the compound-specific instrumental parameters used in the analytical method. The presence of our metabolite of interest was confirmed using the q/Q ratio. The spray voltage was set at 5500 V for positive mode and −4500 V for negative mode. The source temperature was set at 250 °C, the nebulizer gas (Gas 1) and heater gas (Gas 2) at 40 and 20 psi respectively (1 psi=6894.76 Pa). UHP nitrogen (99.999%) was used as both curtain and collision gas (CAD) at 20 and 9 psi respectively. 26

Provide details of methods/pipelines and software used to transform the raw data.

[table “” not found /]


Provide details of methods/pipelines, reference databases and software used to identify features and/or annotate metabolites.

[table “” not found /]


Examples: 

  1. VOCs were identified by comparing the retention times of individual peaks with the retention times of their reference standards, and by identifying the mass spectra using the NIST library. The ratio of each VOC area to the d16-octanal internal standard area was considered to reduce technical variability among extractions and chromatographic runs and VOCs quantity were expressed as μg/kg of berry of d16-octanal equivalents. 20
  2. The acquired spectra were directly converted to NetCDF files using Databridge software (Waters). Peak picking, alignment, and principal component analysis (PCA) were performed using the automated data analysis pipeline MetaDB, developed at our institution. This package supports the execution of experiment compatible with the concept of interoperable bioscience data, including both the production of a validated experimental data set with the relevant metadata in ISA-Tab format. The workflow of MetaDB consists of six different steps: (1) upload of metadata in ISA-Tab format; (2) preparation of MS acquisition sequence, including sample randomization; (3) upload of raw and derived spectral data files; (4) data processing for feature alignment and detection with metaMS; (5) visualization of data for quality assessment; (6) preparation of data for upload to public repositories (Metabolights). 30

The next three sections are called “Samples”, “Assays” and “Metabolites” and they are dedicated to filling the metadata and results of the experiment. These tables can be filled online, or alternatively, once all additional columns have been added, these files can be downloaded and edited in Excel and finally re-uploaded in their final form. It is important that the basic structure of the tables is not altered, without removing or altering existing columns. The file name and extensions must be retained as they are.

Sample information file (s_MTBLSxxx.txt) should provide all relevant facts about each sample included in the study. Sample metadata should include a unique sample name, organism, organism part, etc as seen in the dedicated protocol above. Further sample descriptors should be included where available by selecting +Factor to add new columns (eg. Days after anthesis, Treatment).  The selection of the term is facilitated by a drop-down menu that shows the most relevant ontology term. If there is no ontology term available, there is the possibility to type a free text. The protocol of reference is the ‘Sample collection’.
More samples can be added to the sample table using +Samples and pasting a list or selecting to import Raw data file names if appropriate. There is also the option to add as many new rows as required with +Rows and edit cells individually.
 
Assay information file (a_MTBLSxxx_technique.txt) describes the assay process for each sample and connects the sample name to both its corresponding raw data file and metabolite identification table. Multiple assays can be added per study.
For the LC-MS and GC-MS file, the predefined column includes the protocols of reference (Extraction, Chromatography, Mass Spectrometry, Data Transformation, Metabolite Identification) followed by the instrument and column used and the technical parameters used, such as column type, scan polarity, range, etc. Also here, a drop-down menu with controlled vocabulary helps the users fill the tables so that all the studies using the same metadata information name can be easily findable.
 

Metabolite information file (m_MTBLSxxx_tecnique_MAF.tsv) is the files where the users must add as much information as possible on the metabolite identified in the study and report their final concentration for each sample. Important features to add are metabolite name, Chebi ID, chemical formula, SMILES and InChi codes. Other parameters, such as mass to charge and retention time, are useful for reproducibility. 

 

When creating a study, the metabolights team gives the users the possibility to upload the raw files through either FTP or Aspera Client by giving the credentials to access the remote folder on the Ebi FTP server. We recommend using Filezilla software.

The users receive an email including FTP setting parameters such as:
user: mtblight
password: ****
server: ftp-private.ebi.ac.uk
remote folder: /prod/-obfuscation_code

Please be aware the remote folder needs to be entirely typed, as the folder is not browsable. So use “cd/prod/-obfuscation_code” to access the private folder. Files/Folders that need to be uploaded must not be zip compressed.

Each sample present in the study should have a corresponding raw data file, both of which should be referenced within the assay table.

The accepted file formats for data are the following:

Raw file formats: d, raw, idb, cdf, wiff, scan, dat, cmp, cdf.cmp, lcd, abf, jpf, xps, mgf.
Derived file formats: mzml, nmrml, mzxml, xml, mzdata, cef, cnx, peakml, xy, smp, scan.

A study needs to adhere to some requirements and pass automatic validation to progress to the curation stage. There are 4 validation flags that can be viewed individually by selecting the drop-down menu. This includes successinformationwarning, errorTo progress with the study, users must address all errors and should resolve all warnings. Below are some examples.

 Successfully read the investigation file

 Successfully found one or more samples/factors/descriptors

♦ Could not find any assays

 Found a publication (title/author list/DOI..)

♦ Protocol ‘Sample collection’, ‘Extraction’, ‘Chromatography’, ‘Mass spectrometry’, ‘Data transformation’, ‘Metabolite identification’, match the protocol type definition and the protocol description are validated

♦ Data transformation description should be more than just one sentence

♦ Sample column ‘Sample Name’, ‘Characteristics[Organism]’, ‘Characteristics[Organism part]’, …found in the sample file

♦ Sample column ‘Characteristics[Sample type]’ was not found

♦ No raw or derived files found

♦ Incomplete Metabolite Annotation File (MAF)

Below is a list of publicly-available datasets for Vitis/Grape metabolome.

[table “” not found /]


Panagiotis Arapitsas, Stefania Savoi, Fulvio Mattivi

Primary contacts: panagiotis.arapitsas@fmach.it | savoi.stefania@gmail.com | fulvio.mattivi@unitn.it