Elsevier

Pedosphere

Volume 32, Issue 4, August 2022, Pages 507-520
Pedosphere

Protein sequence databases generated from metagenomics and public databases produced similar soil metaproteomic results of microbial taxonomic and functional changes

https://doi.org/10.1016/S1002-0160(21)60016-4Get rights and content

ABSTRACT

Soil metaproteomics has excellent potential as a tool to elucidate the structural and functional changes in soil microbial communities in response to environmental alterations. However, soil metaproteomics is hindered by several challenges and gaps. Soil microbial communities possess extremely complex microbial composition, including many uncultured microorganisms without whole genome sequencing. Thus, how to select a suitable protein sequence database remains challenging in soil metaproteomics. In this study, the Public database and Meta-database were constructed using protein sequences from public databases and metagenomics, respectively. We comprehensively analyzed and compared the soil metaproteomic results using these two kinds of protein sequence databases for protein identification based on published soil metaproteomic raw data. The results demonstrated that many more proteins, higher sequence coverage, and even more microbial species and functional annotations could be identified using the Meta-database compared with those identified using the Public database. These findings indicated that the Meta-database was more specific as a protein sequence database. However, the follow-up in-depth metaproteomic analyses exhibited similar main results regardless of the database used. The microbial community composition at the genus level was similar between the two databases, especially the species annotations with high peptide-spectrum match and high abundance. The functional analyses in response to stress, such as the gene ontology enrichment of biological progress and molecular function and the key functional microorganisms, were also similar regardless of the database. Our analysis revealed that the Public database could also meet the demand to explore the functional responses of microbial proteins to some extent. This study provides valuable insights into the choice of protein sequence databases and their impacts on subsequent bioinformatic analysis in soil metaproteomic research and will facilitate the optimization of experimental design for different purposes.

Section snippets

INTRODUCTION

Soil is a dynamic system with complex and heterogeneous physical, chemical, and biological interactions. Soil microorganisms play critical roles in ecosystems and are heavily involved in a large number of biogeochemical processes, including nutrient acquisition, recycling of elements (carbon, nitrogen, and phosphorus (P)), and organic matter transformation (van der Heijden et al., 2008; Bastida et al., 2009). Recently, several molecular techniques have been applied to explore soil microbial

Data collection

Soil metaproteomic data were obtained from P-rich and P-deficient soils in a 17-year fertilization experiment in a tropical forest by shotgun proteomics measurements (Yao et al., 2018). Soil samples were collected from four plots (40 m × 40 m) with two technical replicates for each treatment: P-rich soils from plots 1 and 30, and P-deficient soils from plots 6 and 36. The MS/MS spectra raw files and FASTA files of the predicted protein sequences were downloaded from the ProteomeXchange dataset

Number of identified proteins and protein sequence coverage

Two typical protein sequence databases, the Meta-database and Public database, were generated (Table I). Overall, the total number of proteins, the total length of proteins, and the protein length distribution were very similar between the two databases. In both databases, most proteins (93%–95%) contained 51–800 amino acids. Moreover, the number of proteins was the highest within the length range of 201–400 amino acids, and gradually decreased for the lengths greater than 400 amino acids or

DISCUSSION

Soil metaproteomics has been applied increasingly in analyzing soil microbial functions with the development of soil protein extraction methods and mass spectrometry technology in recent ten years. However, bioinformatic analyses of complex and unknown microbial communities are still unclear and poorly studied. This study thoroughly and systematically demonstrated the soil metaproteomic workflow and results using two protein sequence databases, the Meta-database and Public database. More

CONCLUSIONS

In this study, we used two strategies to construct protein sequence databases with comparable distribution of their protein lengths in soil metaproteomics and demonstrated similarities and differences in the results of downstream bioinformatic analysis using two kinds of databases. The Meta-database showed some superiority over the Public database in soil metaproteomics, with the identification of more proteins, higher sequence coverage, and even more microbial taxa. However, regardless of the

CONTRIBUTION OF AUTHORS

The first two authors, Yi Xiong and Lu Zheng, contribute equally to this work.

ACKNOWLEDGEMENT

This work was supported by the National Key Research and Development Program of China (No. 2016YFD0200-308), the National Key Basic Research Program of China (No. 2015CB150501), and the Project of Priority and Key Areas, Institute of Soil Science, Chinese Academy of Sciences (Nos. ISSASIP1605 and ISSASIP1640)

SUPPLEMENTARY MATERIAL

Supplementary material for this article can be found in the online version.

References (70)

  • G Renella et al.

    Environmental proteomics: A long march in the pedosphere

    Soil Biol Biochem

    (2014)
  • V Torsvik et al.

    Microbial diversity and function in soil: From genes to ecosystems

    Curr Opin Microbiol

    (2002)
  • T L Bailey et al.

    MEME SUITE: Tools for motif discovery and searching

    Nucleic Acids Res

    (2009)
  • F Bastida et al.

    Soil metaproteomics: A review of an emerging environmental science. Significance, methodology and perspectives

    Eur J Soil Boil

    (2009)
  • D Benndorf et al.

    Functional metaproteome analysis of protein extracts from contaminated soil and groundwater

    ISME J

    (2007)
  • C T Brown et al.

    Hospitalized premature infants are colonized by related bacterial strains with distinct proteomic profiles

    mBio

    (2018)
  • C N Butterfield et al.

    Proteogenomic analyses indicate bacterial methylotrophy and archaeal heterotrophy are prevalent below the grass root zone

    PeerJ

    (2016)
  • Carlson M. 2020. GO.db: A set of annotation maps describing the entire Gene Ontology. R package version 3.11.4....
  • S A Chamberlain et al.

    Taxize: Taxonomic search and retrieval in R

    F1000Res

    (2013)
  • B Chapman et al.

    High-throughput parallel proteogenomics: A bacterial case study

    Proteomics

    (2014)
  • H B Chen

    VennDiagram: Generate high-resolution Venn and Euler plots. R package version 1.6.20

  • K Chourey et al.

    Direct cellular lysis/protein extraction protocol for soil metaproteomics

    J Proteome Res

    (2010)
  • A Conesa et al.

    Blast2GO: A universal tool for annotation, visualization and analysis in functional genomics research

    Bioinformatics

    (2005)
  • J R Conway et al.

    UpSetR: An R package for the visualization of intersecting sets and their properties

    Bioinformatics

    (2017)
  • R Daniel

    The metagenomics of soil

    Nat Rev Microbiol

    (2005)
  • L M Fu et al.

    CD-HIT: Accelerated for clustering the next-generation sequencing data

    Bioinformatics

    (2012)
  • N Grassl et al.

    Ultra-deep and quantitative saliva proteome reveals dynamics of the oral microbiome

    Genome Med

    (2016)
  • J Hultman et al.

    Multi-omics of permafrost, active layer and thermokarst bog soil microbiomes

    Nature

    (2015)
  • J S Johnson et al.

    Evaluation of 16S rRNA gene sequencing for species and strain-level microbiome analysis

    Nat Commun

    (2019)
  • A S Johnson-Rollings et al.

    Exploring the functional soil-microbe interface and exoenzymes through soil metaexoproteomics

    ISME J

    (2014)
  • K M Keiblinger et al.

    Soil and leaf litter metaproteomics—A brief guideline from sampling to understanding

    FEMS Microbiol Ecol

    (2016)
  • M Kleiner et al.

    Assessing species biomass contributions in microbial communities via metaproteomics

    Nat Commun

    (2017)
  • R Kolde

    Pretty heatmaps. R package version 1.0.12

  • S Kumar et al.

    MEGA X: Molecular evolutionary genetics analysis across computing platforms

    Mol Biol Evol

    (2018)
  • B J Kunath et al.

    Metaproteomics: Sample preparation and methodological considerations

  • Cited by (0)

    View full text