Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 96
Filtrar
1.
Bioinformatics ; 40(Supplement_1): i257-i265, 2024 Jun 28.
Artigo em Inglês | MEDLINE | ID: mdl-38940141

RESUMO

MOTIVATION: Tandem mass spectrometry (MS/MS) is a crucial technology for large-scale proteomic analysis. The protein database search or the spectral library search are commonly used for peptide identification from MS/MS spectra, which, however, may face challenges due to experimental variations between replicated spectra and similar fragmentation patterns among distinct peptides. To address this challenge, we present SpecEncoder, a deep metric learning approach to address these challenges by transforming MS/MS spectra into robust and sensitive embedding vectors in a latent space. The SpecEncoder model can also embed predicted MS/MS spectra of peptides, enabling a hybrid search approach that combines spectral library and protein database searches for peptide identification. RESULTS: We evaluated SpecEncoder on three large human proteomics datasets, and the results showed a consistent improvement in peptide identification. For spectral library search, SpecEncoder identifies 1%-2% more unique peptides (and PSMs) than SpectraST. For protein database search, it identifies 6%-15% more unique peptides than MSGF+ enhanced by Percolator, Furthermore, SpecEncoder identified 6%-12% additional unique peptides when utilizing a combined library of experimental and predicted spectra. SpecEncoder can also identify more peptides when compared to deep-learning enhanced methods (MSFragger boosted by MSBooster). These results demonstrate SpecEncoder's potential to enhance peptide identification for proteomic data analyses. AVAILABILITY AND IMPLEMENTATION: The source code and scripts for SpecEncoder and peptide identification are available on GitHub at https://github.com/lkytal/SpecEncoder. Contact: hatang@iu.edu.


Assuntos
Bases de Dados de Proteínas , Peptídeos , Proteômica , Espectrometria de Massas em Tandem , Proteômica/métodos , Peptídeos/química , Humanos , Espectrometria de Massas em Tandem/métodos , Aprendizado Profundo , Software
2.
BMC Bioinformatics ; 25(1): 85, 2024 Feb 28.
Artigo em Inglês | MEDLINE | ID: mdl-38413857

RESUMO

PURPOSE: Despite the many progresses with alignment algorithms, aligning divergent protein sequences with less than 20-35% pairwise identity (so called "twilight zone") remains a difficult problem. Many alignment algorithms have been using substitution matrices since their creation in the 1970's to generate alignments, however, these matrices do not work well to score alignments within the twilight zone. We developed Protein Embedding based Alignments, or PEbA, to better align sequences with low pairwise identity. Similar to the traditional Smith-Waterman algorithm, PEbA uses a dynamic programming algorithm but the matching score of amino acids is based on the similarity of their embeddings from a protein language model. METHODS: We tested PEbA on over twelve thousand benchmark pairwise alignments from BAliBASE, each one extracted from one of their multiple sequence alignments. Five different BAliBASE references were used, each with different sequence identities, motifs, and lengths, allowing PEbA to showcase how well it aligns under different circumstances. RESULTS: PEbA greatly outperformed BLOSUM substitution matrix-based pairwise alignments, achieving different levels of improvements of the alignment quality for pairs of sequences with different levels of similarity (over four times as well for pairs of sequences with <10% identity). We also compared PEbA with embeddings generated by different protein language models (ProtT5 and ESM-2) and found that ProtT5-XL-U50 produced the most useful embeddings for aligning protein sequences. PEbA also outperformed DEDAL and vcMSA, two recently developed protein language model embedding-based alignment methods. CONCLUSION: Our results suggested that general purpose protein language models provide useful contextual information for generating more accurate protein alignments than typically used methods.


Assuntos
Ácidos Borônicos , Proteínas , Proteínas/química , Sequência de Aminoácidos , Alinhamento de Sequência , Algoritmos
3.
Bioinformatics ; 39(6)2023 06 01.
Artigo em Inglês | MEDLINE | ID: mdl-37252828

RESUMO

MOTIVATION: Tandem mass spectrometry is an essential technology for characterizing chemical compounds at high sensitivity and throughput, and is commonly adopted in many fields. However, computational methods for automated compound identification from their MS/MS spectra are still limited, especially for novel compounds that have not been previously characterized. In recent years, in silico methods were proposed to predict the MS/MS spectra of compounds, which can then be used to expand the reference spectral libraries for compound identification. However, these methods did not consider the compounds' 3D conformations, and thus neglected critical structural information. RESULTS: We present the 3D Molecular Network for Mass Spectra Prediction (3DMolMS), a deep neural network model to predict the MS/MS spectra of compounds from their 3D conformations. We evaluated the model on the experimental spectra collected in several spectral libraries. The results showed that 3DMolMS predicted the spectra with the average cosine similarity of 0.691 and 0.478 with the experimental MS/MS spectra acquired in positive and negative ion modes, respectively. Furthermore, 3DMolMS model can be generalized to the prediction of MS/MS spectra acquired by different labs on different instruments through minor fine-tuning on a small set of spectra. Finally, we demonstrate that the molecular representation learned by 3DMolMS from MS/MS spectra prediction can be adapted to enhance the prediction of chemical properties such as the elution time in the liquid chromatography and the collisional cross section measured by ion mobility spectrometry, both of which are often used to improve compound identification. AVAILABILITY AND IMPLEMENTATION: The codes of 3DMolMS are available at https://github.com/JosieHong/3DMolMS and the web service is at https://spectrumprediction.gnps2.org.


Assuntos
Espectrometria de Massas em Tandem , Espectrometria de Massas em Tandem/métodos , Cromatografia Líquida/métodos , Conformação Molecular
4.
Nucleic Acids Res ; 50(5): e29, 2022 03 21.
Artigo em Inglês | MEDLINE | ID: mdl-34904653

RESUMO

Reverse transcriptases (RTs) are found in different systems including group II introns, Diversity Generating Retroelements (DGRs), retrons, CRISPR-Cas systems, and Abortive Infection (Abi) systems in prokaryotes. Different classes of RTs can play different roles, such as template switching and mobility in group II introns, spacer acquisition in CRISPR-Cas systems, mutagenic retrohoming in DGRs, programmed cell suicide in Abi systems, and recently discovered phage defense in retrons. While some classes of RTs have been studied extensively, others remain to be characterized. There is a lack of computational tools for identifying and characterizing various classes of RTs. In this study, we built a tool (called myRT) for identification and classification of prokaryotic RTs. In addition, our tool provides information about the genomic neighborhood of each RT, providing potential functional clues. We applied our tool to predict RTs in all complete and draft bacterial genomes, and created a collection that can be used for exploration of putative RTs and their associated protein domains. Application of myRT to metagenomes showed that gut metagenomes encode proportionally more RTs related to DGRs, outnumbering retron-related RTs, as compared to the collection of reference genomes. MyRT is both available as a standalone software (https://github.com/mgtools/myRT) and also through a website (https://omics.informatics.indiana.edu/myRT/).


Assuntos
Genoma Bacteriano , Metagenoma , DNA Polimerase Dirigida por RNA , Bacteriófagos/genética , Humanos , DNA Polimerase Dirigida por RNA/metabolismo , Retroelementos/genética
5.
J Proteome Res ; 22(2): 442-453, 2023 02 03.
Artigo em Inglês | MEDLINE | ID: mdl-36688801

RESUMO

The microbiome has been shown to be important for human health because of its influence on disease and the immune response. Mass spectrometry is an important tool for evaluating protein expression and species composition in the microbiome but is technically challenging and time-consuming. Multiplexing has emerged as a way to make spectrometry workflows faster while improving results. Here, we present MetaProD (MetaProteomics in Django) as a highly configurable metaproteomic data analysis pipeline supporting label-free and multiplexed mass spectrometry. The pipeline is open-source, uses fully open-source tools, and is integrated with Django to offer a web-based interface for configuration and data access. Benchmarking of MetaProD using multiple metaproteomics data sets showed that MetaProD achieved fast and efficient identification of peptides and proteins. Application of MetaProD to a multiplexed cancer data set resulted in identification of more differentially expressed human proteins in cancer tissues versus healthy tissues as compared to previous studies; in addition, MetaProD identified bacterial proteins in those samples, some of which are differentially abundant.


Assuntos
Microbiota , Proteômica , Humanos , Proteômica/métodos , Espectrometria de Massas , Proteínas de Bactérias , Análise Espectral
6.
PLoS Comput Biol ; 18(3): e1009397, 2022 03.
Artigo em Inglês | MEDLINE | ID: mdl-35302987

RESUMO

Host-microbiome interactions and the microbial community have broad impact in human health and diseases. Most microbiome based studies are performed at the genome level based on next-generation sequencing techniques, but metaproteomics is emerging as a powerful technique to study microbiome functional activity by characterizing the complex and dynamic composition of microbial proteins. We conducted a large-scale survey of human gut microbiome metaproteomic data to identify generalist species that are ubiquitously expressed across all samples and specialists that are highly expressed in a small subset of samples associated with a certain phenotype. We were able to utilize the metaproteomic mass spectrometry data to reveal the protein landscapes of these species, which enables the characterization of the expression levels of proteins of different functions and underlying regulatory mechanisms, such as operons. Finally, we were able to recover a large number of open reading frames (ORFs) with spectral support, which were missed by de novo protein-coding gene predictors. We showed that a majority of the rescued ORFs overlapped with de novo predicted protein-coding genes, but on opposite strands or in different frames. Together, these demonstrate applications of metaproteomics for the characterization of important gut bacterial species.


Assuntos
Microbioma Gastrointestinal , Microbiota , Bactérias/genética , Microbioma Gastrointestinal/genética , Humanos , Microbiota/genética , Proteoma/análise , Proteômica/métodos
7.
BMC Genomics ; 23(1): 573, 2022 Aug 11.
Artigo em Inglês | MEDLINE | ID: mdl-35953824

RESUMO

BACKGROUND: CRISPR-Cas (clustered regularly interspaced short palindromic repeats-CRISPR-associated proteins) systems are adaptive immune systems commonly found in prokaryotes that provide sequence-specific defense against invading mobile genetic elements (MGEs). The memory of these immunological encounters are stored in CRISPR arrays, where spacer sequences record the identity and history of past invaders. Analyzing such CRISPR arrays provide insights into the dynamics of CRISPR-Cas systems and the adaptation of their host bacteria to rapidly changing environments such as the human gut. RESULTS: In this study, we utilized 601 publicly available Bacteroides fragilis genome isolates from 12 healthy individuals, 6 of which include longitudinal observations, and 222 available B. fragilis reference genomes to update the understanding of B. fragilis CRISPR-Cas dynamics and their differential activities. Analysis of longitudinal genomic data showed that some CRISPR array structures remained relatively stable over time whereas others involved radical spacer acquisition during some periods, and diverse CRISPR arrays (associated with multiple isolates) co-existed in the same individuals with some persisted over time. Furthermore, features of CRISPR adaptation, evolution, and microdynamics were highlighted through an analysis of host-MGE network, such as modules of multiple MGEs and hosts, reflecting complex interactions between B. fragilis and its invaders mediated through the CRISPR-Cas systems. CONCLUSIONS: We made available of all annotated CRISPR-Cas systems and their target MGEs, and their interaction network as a web resource at https://omics.informatics.indiana.edu/CRISPRone/Bfragilis . We anticipate it will become an important resource for studying of B. fragilis, its CRISPR-Cas systems, and its interaction with mobile genetic elements providing insights into evolutionary dynamics that may shape the species virulence and lead to its pathogenicity.


Assuntos
Proteínas Associadas a CRISPR , Sistemas CRISPR-Cas , Bactérias/genética , Bacteroides fragilis/genética , Proteínas Associadas a CRISPR/genética , Sistemas CRISPR-Cas/genética , Genômica , Humanos
8.
J Environ Sci (China) ; 116: 198-208, 2022 Jun.
Artigo em Inglês | MEDLINE | ID: mdl-35219418

RESUMO

Soil formation and ecological rehabilitation is the most promising strategy to eliminate environmental risks of bauxite residue disposal areas. Its poor physical structure is nevertheless a major limitation to plant growth. Organic materials were demonstrated as effective ameliorants to improve the physical conditions of bauxite residue. In this study, three different organic materials including straw (5% W/W), humic acid (5% W/W), and humic acid-acrylamide polymer (0.2% and 0.4%, W/W) were selected to evaluate their effects on physical conditions of bauxite residue pretreated by phosphogypsum following a 120-day incubation experiment. The proportion of 2-1 mm macro-aggregates, mean weight diameter (MWD) and geometric mean diameter (GWD) increased following organic materials addition, which indicated that organic materials could enhance aggregate stability. Compared with straw, and humic acid, humic acid-acrylamide polymer application had improved effects on the formation of water-stable aggregates in the residues. Furthermore, organic materials increased the total porosity, total pore volume and average pore diameter, and reduced the micropore content according to nitrogen gas adsorption (NA) and mercury intrusion porosimetry (MIP) analysis, whilst enhancing water retention of the residues based on water characteristic curves. Compared with traditional organic wastes, humic acid-acrylamide polymer could be regarded as a candidate according to the comprehensive consideration of the additive amount and the effects on physical conditions of bauxite residue. These findings could provide a novel application to both Ca-contained acid solid waste and high-molecular polymers on ecological rehabilitation at disposal areas.


Assuntos
Óxido de Alumínio , Poluentes do Solo , Óxido de Alumínio/química , Substâncias Húmicas , Solo/química , Microbiologia do Solo , Poluentes do Solo/química
9.
PLoS Comput Biol ; 16(10): e1007951, 2020 10.
Artigo em Inglês | MEDLINE | ID: mdl-33125363

RESUMO

Microbial community members exhibit various forms of interactions. Taking advantage of the increasing availability of microbiome data, many computational approaches have been developed to infer bacterial interactions from the co-occurrence of microbes across diverse microbial communities. Additionally, the introduction of genome-scale metabolic models have also enabled the inference of cooperative and competitive metabolic interactions between bacterial species. By nature, phylogenetically similar microbial species are more likely to share common functional profiles or biological pathways due to their genomic similarity. Without properly factoring out the phylogenetic relationship, any estimation of the competition and cooperation between species based on functional/pathway profiles may bias downstream applications. To address these challenges, we developed a novel approach for estimating the competition and complementarity indices for a pair of microbial species, adjusted by their phylogenetic distance. An automated pipeline, PhyloMint, was implemented to construct competition and complementarity indices from genome scale metabolic models derived from microbial genomes. Application of our pipeline to 2,815 human-gut associated bacteria showed high correlation between phylogenetic distance and metabolic competition/cooperation indices among bacteria. Using a discretization approach, we were able to detect pairs of bacterial species with cooperation scores significantly higher than the average pairs of bacterial species with similar phylogenetic distances. A network community analysis of high metabolic cooperation but low competition reveals distinct modules of bacterial interactions. Our results suggest that niche differentiation plays a dominant role in microbial interactions, while habitat filtering also plays a role among certain clades of bacterial species.


Assuntos
Bactérias , Interações Microbianas , Microbiota , Modelos Biológicos , Bactérias/classificação , Bactérias/genética , Bactérias/metabolismo , Biologia Computacional , Genoma Bacteriano/genética , Genômica , Humanos , Interações Microbianas/genética , Interações Microbianas/fisiologia , Microbiota/genética , Microbiota/fisiologia , Filogenia
10.
Mol Cell Proteomics ; 18(8 suppl 1): S183-S192, 2019 08 09.
Artigo em Inglês | MEDLINE | ID: mdl-31142575

RESUMO

Matching metagenomic and/or metatranscriptomic data, currently often under-used, can be useful reference for metaproteomic tandem mass spectra (MS/MS) data analysis. Here we developed a software pipeline for identification of peptides and proteins from metaproteomic MS/MS data using proteins derived from matching metagenomic (and metatranscriptomic) data as the search database, based on two novel approaches Graph2Pro (published) and Var2Pep (new). Graph2Pro retains and uses uncertainties of metagenome assembly for reference-based MS/MS data analysis. Var2Pep considers the variations found in metagenomic/metatranscriptomic sequencing reads that are not retained in the assemblies (contigs). The new software pipeline provides one stop application of both tools, and it supports the use of metagenome assembly from commonly used assemblers including MegaHit and metaSPAdes. When tested on two collections of multi-omic microbiome data sets, our pipeline significantly improved the identification rate of the metaproteomic MS/MS spectra by about two folds, comparing to conventional contig- or read-based approaches (the Var2Pep alone identified 5.6% to 24.1% more unique peptides, depending on the data set). We also showed that identified variant peptides are important for functional profiling of microbiomes. All results suggested that it is important to take into consideration of the assembly uncertainties and genomic variants to facilitate metaproteomic MS/MS data interpretation.


Assuntos
Algoritmos , Microbiota/genética , Proteogenômica/métodos , Água do Mar/microbiologia , Águas Residuárias/microbiologia , Bases de Dados de Proteínas , Variação Genética , Peptídeos/genética , Espectrometria de Massas em Tandem
11.
Nucleic Acids Res ; 47(W1): W289-W294, 2019 07 02.
Artigo em Inglês | MEDLINE | ID: mdl-31049585

RESUMO

MyDGR is a web server providing integrated prediction and visualization of Diversity-Generating Retroelements (DGR) systems in query nucleotide sequences. It is built upon an enhanced version of DGRscan, a tool we previously developed for identification of DGR systems. DGR systems are remarkable genetic elements that use error-prone reverse transcriptases to generate vast sequence variants in specific target genes, which have been shown to benefit their hosts (bacteria, archaea or phages). As the first web server for annotation of DGR systems, myDGR is freely available on the web at http://omics.informatics.indiana.edu/myDGR with all major browsers supported. MyDGR accepts query nucleotide sequences in FASTA format, and outputs all the important features of a predicted DGR system, including a reverse transcriptase, a template repeat and one (or more) variable repeats and their alignment featuring A-to-N (N can be C, T or G) substitutions, and VR-containing target gene(s). In addition to providing the results as text files for download, myDGR generates a visual summary of the results for users to explore the predicted DGR systems. Users can also directly access pre-calculated, putative DGR systems identified in currently available reference bacterial genomes and a few other collections of sequences (including human microbiomes).


Assuntos
Genoma , Anotação de Sequência Molecular/métodos , Software , Archaea/genética , Bactérias/genética , Bacteriófagos/genética , Sequência de Bases , Loci Gênicos , Humanos , Armazenamento e Recuperação da Informação , Internet , Microbiota/genética , DNA Polimerase Dirigida por RNA/genética , Alinhamento de Sequência
12.
Anal Chem ; 92(6): 4275-4283, 2020 03 17.
Artigo em Inglês | MEDLINE | ID: mdl-32053352

RESUMO

The ability to predict tandem mass (MS/MS) spectra from peptide sequences can significantly enhance our understanding of the peptide fragmentation process and could improve peptide identification in proteomics. However, current approaches for predicting high-energy collisional dissociation (HCD) spectra are limited to predict the intensities of expected ion types, that is, the a/b/c/x/y/z ions and their neutral loss derivatives (referred to as backbone ions). In practice, backbone ions only account for <70% of total ion intensities in HCD spectra, indicating many intense ions are ignored by current predictors. In this paper, we present a deep learning approach that can predict the complete spectra (both backbone and nonbackbone ions) directly from peptide sequences. We made no assumptions or expectations on which kind of ions to predict but instead predicting the intensities for all possible m/z. Training this model needs no annotations of fragment ion nor any prior knowledge of the fragmentation rules. Our analyses show that the predicted 2+ and 3+ HCD spectra are highly similar to the experimental spectra, with average full-spectrum cosine similarities of 0.820 (±0.088) and 0.786 (±0.085), respectively, very close to the similarities between the experimental replicated spectra. In contrast, the best-performed backbone only models can only achieve an average similarity below 0.75 and 0.70 for 2+ and 3+ spectra, respectively. Furthermore, we developed a multitask learning (MTL) approach for predicting spectra of insufficient training samples, which allows our model to make accurate predictions for electron transfer dissociation (ETD) spectra and HCD spectra of less abundant charges (1+ and 4+).


Assuntos
Redes Neurais de Computação , Peptídeos/análise , Espectrometria de Massas em Tandem
13.
J Environ Manage ; 256: 109981, 2020 Feb 15.
Artigo em Inglês | MEDLINE | ID: mdl-31989989

RESUMO

Bauxite residue is a highly alkaline solid waste with poor physical structure which ultimately limits plant growth. Ecological reconstruction is an effective strategy to improve its environmental management, although soil formation process still requires further investigation. Here, an incubation experiment was used to investigate the effects of phosphogypsum and poultry manure, on aggregate size distribution and aggregate-associated exchangeable bases of bauxite residue. Phosphogypsum and poultry manure additions significantly increased the proportion of 2-1 mm residue aggregates and enhanced mean weight diameter (MWD) of residues in the 0-20 cm and 20-40 cm layers, although little effect was evident in the 40-60 cm layer. Phosphogypsum addition reduced pH and EC values to approximately 8.5 and 200 mS/cm in different size aggregates at 0-20 cm. Exchangeable Ca2+ concentration was improved, especially in 0.25-0.05 mm and <0.05 mm aggregates, following amendment additions. The relative contents of katoite and cancrinite in >0.25 mm aggregate fractions were relatively higher, which was consistent with changes in pH. Phosphogypsum and poultry manure changed the microstructure and surrounding pores of residue aggregates, whilst the concentration of Ca on microaggregate surfaces was higher than that on macroaggregates. These findings reveal that application of phosphogypsum and poultry manure directly alter the distribution of exchangeable bases and alkaline indicators within residue aggregates, resulting in aggregate size distribution and microstructure variations.


Assuntos
Óxido de Alumínio , Esterco , Animais , Sulfato de Cálcio , Fósforo , Aves Domésticas , Solo
14.
BMC Genomics ; 20(1): 567, 2019 Jul 09.
Artigo em Inglês | MEDLINE | ID: mdl-31288753

RESUMO

BACKGROUND: Sequencing of microbiomes has accelerated the characterization of the diversity of CRISPR-Cas immune systems. However, the utilization of next generation short read sequences for the characterization of CRISPR-Cas dynamics remains limited due to the repetitive nature of CRISPR arrays. CRISPR arrays are comprised of short spacer segments (derived from invaders' genomes) interspaced between flanking repeat sequences. The repetitive structure of CRISPR arrays poses a computational challenge for the accurate assembly of CRISPR arrays from short reads. In this paper we evaluate the use of long read sequences for the analysis of CRISPR-Cas system dynamics in microbiomes. RESULTS: We analyzed a dataset of Illumina's TruSeq Synthetic Long-Reads (SLR) derived from a gut microbiome. We showed that long reads captured CRISPR spacers at a high degree of redundancy, which highlights the spacer conservation of spacer sharing CRISPR variants, enabling the study of CRISPR array dynamics in ways difficult to achieve though short read sequences. We introduce compressed spacer graphs, a visual abstraction of spacer sharing CRISPR arrays, to provide a simplified view of complex organizational structures present within CRISPR array dynamics. Utilizing compressed spacer graphs, several key defining characteristics of CRISPR-Cas system dynamics were observed including spacer acquisition and loss events, conservation of the trailer end spacers, and CRISPR arrays' directionality (transcription orientation). Other result highlights include the observation of intense array contraction and expansion events, and reconstruction of a full-length genome for a potential invader (Faecalibacterium phage) based on identified spacers. CONCLUSION: We demonstrate in an in silico system that long reads provide the necessary context for characterizing the organization of CRISPR arrays in a microbiome, and reveal dynamic and evolutionary features of CRISPR-Cas systems in a microbial population.


Assuntos
Repetições Palindrômicas Curtas Agrupadas e Regularmente Espaçadas/genética , Variação Genética , Microbiota/genética , DNA Intergênico/genética
15.
J Environ Sci (China) ; 85: 74-81, 2019 Nov.
Artigo em Inglês | MEDLINE | ID: mdl-31471033

RESUMO

A column leaching experiment was used to investigate the efficacy of amendments on their ability to remove alkaline anions and metal ions from bauxite residue leachates. Treatments included, simulated acid rain (AR), phosphogypsum + vermicompost (PVC), phosphogypsum + vermicompost + simulated acid rain (PVA), and biosolids + microorganisms (BSM) together with controls (CK). Results indicated that amendment could effectively reduce the leachate pH and EC values, neutralize OH-, CO32-, HCO3-, and water soluble alkali, and suppress arsenic (As) content. Correlation analysis revealed significant linear correlations with pH and concentrations of OH-, CO32-, HCO3-, water-soluble alkali, and metal ions. BSM treatment showed optimum results with neutralizing anions (OH-, CO32-, and HCO3-), water soluble alkali, and removal of metal ions (Al, As, B, Mo, V, and Na), which was attributed to neutralization from the generation of small molecular organic acids and organic matter during microbial metabolism. BSM treatment reduced alkaline anions and metal ions based on neutralization reactions in bauxite residue leachate, which reduced the potential pollution effects from leachates on the soil surrounding bauxite residue disposal areas.


Assuntos
Óxido de Alumínio/química , Metais/química , Modelos Químicos , Poluentes do Solo/química , Ânions
16.
J Environ Sci (China) ; 78: 276-286, 2019 Apr.
Artigo em Inglês | MEDLINE | ID: mdl-30665646

RESUMO

Bauxite residue is a highly alkaline byproduct which is routinely discarded at residue disposal areas. Improving soil formation process to revegetate the special degraded lands is a promising strategy for sustainable management of the refining industry. A laboratory incubation experiment was used to evaluate the effects of gypsum and vermicompost on stable aggregate formation of bauxite residue. Aggregate size distribution was quantified by fractal theory, whilst residue microstructure was determined by scanning electron microscopy and synchrotron-based X-ray micro-computed tomography. Amendments addition increased the content of macro-aggregates (>250 µm) and enhanced aggregate stability of bauxite residue. Following gypsum and vermicompost addition, fractal dimension decreased from 2.84 to 2.77, which indicated a more homogeneous distribution of aggregate particles. Images from scanning electron microscopy and three-dimensional microstructure demonstrated that amendments stimulate the formation of improved structure in residue aggregates. Pore parameters including porosity, pore throat surface area, path length, and path tortuosity increased under amendment additions. Changes in aggregate size distribution and microstructure of bauxite residue indicated that additions of gypsum and vermicompost were beneficial to physical condition of bauxite residue which may enhance the ease of vegetation.


Assuntos
Óxido de Alumínio/química , Recuperação e Remediação Ambiental/métodos , Solo/química , Poluentes do Solo/química
17.
RNA ; 22(7): 945-56, 2016 07.
Artigo em Inglês | MEDLINE | ID: mdl-27190232

RESUMO

CRISPR-Cas systems are bacterial adaptive immune systems, each typically composed of a locus of cas genes and a CRISPR array of spacers flanked by repeats. Processed transcripts of CRISPR arrays (crRNAs) play important roles in the interference process mediated by these systems, guiding targeted immunity. Here we developed computational approaches that allow us to characterize the expression of many CRISPRs in their natural environments, using community RNA-seq (metatranscriptomic) data. By exploiting public human gut metatranscriptomic data sets, we studied the expression of 56 repeat-sequence types of CRISPRs, revealing that most CRISPRs are transcribed in one direction (producing crRNAs). In rarer cases, including a type II system associated with Bacteroides fragilis, CRISPRs are transcribed in both directions. Type III CRISPR-Cas systems were found in the microbiomes, but metatranscriptomic reads were barely found for their CRISPRs. We observed individual-level variation of the crRNA transcription, and an even greater transcription of a CRISPR from the antisense strand than the crRNA strand in one sample. The orientations of CRISPR expression implicated by metatranscriptomic data are largely in agreement with prior predictions for CRISPRs, with exceptions. Our study shows the promise of exploiting community RNA-seq data for investigating the transcription of CRISPR-Cas systems.


Assuntos
Repetições Palindrômicas Curtas Agrupadas e Regularmente Espaçadas , RNA/genética , Transcrição Gênica , Transcriptoma , Perfilação da Expressão Gênica
18.
Methods ; 129: 8-17, 2017 10 01.
Artigo em Inglês | MEDLINE | ID: mdl-28454776

RESUMO

Recent years have witnessed unprecedented accumulation of DNA sequences and therefore protein sequences (predicted from DNA sequences), due to the advances of sequencing technology. One of the major sources of the hypothetical proteins is the metagenomics research. Current annotation of metagenomes (collections of short metagenomic sequences or assemblies) relies on similarity searches against known gene/protein families, based on which functional profiles of microbial communities can be built. This practice, however, leaves out the hypothetical proteins, which may outnumber the known proteins for many microbial communities. On the other hand, we may ask: what can we gain from the large number of metagenomes made available by the metagenomic studies, for the annotation of metagenomic sequences as well as functional annotation of hypothetical proteins in general? Here we propose a community profiling approach for predicting functional associations between proteins: two proteins are predicted to be associated if they share similar presence and absence profiles (called community profiles) across microbial communities. Community profiling is conceptually similar to the phylogenetic profiling approach to functional prediction, however with fundamental differences. We tested different profile construction methods, the selection of reference metagenomes, and correlation metrics, among others, to optimize the performance of this new approach. We demonstrated that the community profiling approach alone slightly outperforms the phylogenetic profiling approach for associating proteins in species that are well represented by sequenced genomes, and combining phylogenetic and community profiling further improves (though only marginally) the prediction of functional association. Further we showed that community profiling method significantly outperforms phylogenetic profiling, revealing more functional associations, when applied to a more recently sequenced bacterial genome.


Assuntos
Metagenômica , Consórcios Microbianos/genética , Análise de Sequência de DNA/métodos , Software , Algoritmos , Biologia Computacional/métodos , Bases de Dados Genéticas , Genoma Bacteriano , Filogenia
19.
BMC Bioinformatics ; 18(1): 92, 2017 Feb 06.
Artigo em Inglês | MEDLINE | ID: mdl-28166719

RESUMO

BACKGROUND: The CRISPR-Cas systems in prokaryotes are RNA-guided immune systems that target and deactivate foreign nucleic acids. A typical CRISPR-Cas system consists of a CRISPR array of repeat and spacer units, and a locus of cas genes. The CRISPR and the cas locus are often located next to each other in the genomes. However, there is no quantitative estimate of the co-location. In addition, ad-hoc studies have shown that some non-CRISPR genomic elements contain repeat-spacer-like structures and are mistaken as CRISPRs. RESULTS: Using available genome sequences, we observed that a significant number of genomes have isolated cas loci and/or CRISPRs. We found that 11%, 22% and 28% of the type I, II and III cas loci are isolated (without CRISPRs in the same genomes at all or with CRISPRs distant in the genomes), respectively. We identified a large number of genomic elements that superficially reassemble CRISPRs but don't contain diverse spacers and have no companion cas genes. We called these elements false-CRISPRs and further classified them into groups, including tandem repeats and Staphylococcus aureus repeat (STAR)-like elements. CONCLUSION: This is the first systematic study to collect and characterize false-CRISPR elements. We demonstrated that false-CRISPRs could be used to reduce the false annotation of CRISPRs, therefore showing them to be useful for improving the annotation of CRISPR-Cas systems.


Assuntos
Sistemas CRISPR-Cas , Repetições Palindrômicas Curtas Agrupadas e Regularmente Espaçadas/genética , Elementos Facilitadores Genéticos , Genoma Bacteriano , Loci Gênicos , Genômica , Anotação de Sequência Molecular , Filogenia , Software , Staphylococcus aureus/genética , Streptococcus pyogenes/genética , Streptococcus thermophilus/genética
20.
Bioinformatics ; 32(7): 1001-8, 2016 04 01.
Artigo em Inglês | MEDLINE | ID: mdl-26319390

RESUMO

MOTIVATION: Metagenomics research has accelerated the studies of microbial organisms, providing insights into the composition and potential functionality of various microbial communities. Metatranscriptomics (studies of the transcripts from a mixture of microbial species) and other meta-omics approaches hold even greater promise for providing additional insights into functional and regulatory characteristics of the microbial communities. Current metatranscriptomics projects are often carried out without matched metagenomic datasets (of the same microbial communities). For the projects that produce both metatranscriptomic and metagenomic datasets, their analyses are often not integrated. Metagenome assemblies are far from perfect, partially explaining why metagenome assemblies are not used for the analysis of metatranscriptomic datasets. RESULTS: Here, we report a reads mapping algorithm for mapping of short reads onto a de Bruijn graph of assemblies. A hash table of junction k-mers (k-mers spanning branching structures in the de Bruijn graph) is used to facilitate fast mapping of reads to the graph. We developed an application of this mapping algorithm: a reference-based approach to metatranscriptome assembly using graphs of metagenome assembly as the reference. Our results show that this new approach (called TAG) helps to assemble substantially more transcripts that otherwise would have been missed or truncated because of the fragmented nature of the reference metagenome. AVAILABILITY AND IMPLEMENTATION: TAG was implemented in C++ and has been tested extensively on the Linux platform. It is available for download as open source at http://omics.informatics.indiana.edu/TAG CONTACT: yye@indiana.edu.


Assuntos
Algoritmos , Metagenômica , Metagenoma
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA