RESUMO
Comprehensive protein function annotation is essential for understanding microbiome-related disease mechanisms in the host organisms. However, a large portion of human gut microbial proteins lack functional annotation. Here, we have developed a new metagenome analysis workflow integrating de novo genome reconstruction, taxonomic profiling, and deep learning-based functional annotations from DeepFRI. This is the first approach to apply deep learning-based functional annotations in metagenomics. We validate DeepFRI functional annotations by comparing them to orthology-based annotations from eggNOG on a set of 1,070 infant metagenomes from the DIABIMMUNE cohort. Using this workflow, we generated a sequence catalogue of 1.9 million nonredundant microbial genes. The functional annotations revealed 70% concordance between Gene Ontology annotations predicted by DeepFRI and eggNOG. DeepFRI improved the annotation coverage, with 99% of the gene catalogue obtaining Gene Ontology molecular function annotations, although they are less specific than those from eggNOG. Additionally, we constructed pangenomes in a reference-free manner using high-quality metagenome-assembled genomes (MAGs) and analyzed the associated annotations. eggNOG annotated more genes on well-studied organisms, such as Escherichia coli, while DeepFRI was less sensitive to taxa. Further, we show that DeepFRI provides additional annotations in comparison to the previous DIABIMMUNE studies. This workflow will contribute to novel understanding of the functional signature of the human gut microbiome in health and disease as well as guiding future metagenomics studies. IMPORTANCE The past decade has seen advancement in high-throughput sequencing technologies resulting in rapid accumulation of genomic data from microbial communities. While this growth in sequence data and gene discovery is impressive, the majority of microbial gene functions remain uncharacterized. The coverage of functional information coming from either experimental sources or inferences is low. To solve these challenges, we have developed a new workflow to computationally assemble microbial genomes and annotate the genes using a deep learning-based model DeepFRI. This improved microbial gene annotation coverage to 1.9 million metagenome-assembled genes, representing 99% of the assembled genes, which is a significant improvement compared to 12% Gene Ontology term annotation coverage by commonly used orthology-based approaches. Importantly, the workflow supports pangenome reconstruction in a reference-free manner, allowing us to analyze the functional potential of individual bacterial species. We therefore propose this alternative approach combining deep-learning functional predictions with the commonly used orthology-based annotations as one that could help us uncover novel functions observed in metagenomic microbiome studies.
Assuntos
Aprendizado Profundo , Microbiota , Humanos , Metagenoma/genética , Anotação de Sequência Molecular , Microbiota/genética , Genoma MicrobianoRESUMO
The invasion of human erythrocytes by Plasmodium falciparum merozoites requires interaction between parasite ligands and host receptors. Interaction of PfRh5-CyRPA-Ripr protein complex with basigin, an erythrocyte surface receptor, via PfRh5 is essential for erythrocyte invasion. Antibodies raised against each antigen component of the complex have demonstrated erythrocyte invasion inhibition, making these proteins potential blood-stage vaccine candidates. Genetic polymorphisms present a significant challenge in developing efficacious vaccines, leading to variant-specific immune responses. This study investigated the genetic variations of the PfRh5 complex proteins in P. falciparum isolates from Lake Victoria islands, Western Kenya. Here, twenty-nine microscopically confirmed P. falciparum field samples collected from islands in Lake Victoria between July 2014 and July 2016 were genotyped by whole genome sequencing, and results compared to sequences mined from the GenBank database, from a study conducted in Kilifi, as well as other sequences from the MalariaGEN repository. We analyzed the frequency of polymorphisms in the PfRh5 protein complex proteins, PfRh5, PfCyRPA, PfRipr, and PfP113, and their location mapped on the 3D protein complex structure. We identified a total of 58 variants in the PfRh5 protein complex. PfRh5 protein was the most polymorphic with 30 SNPs, while PfCyRPA was relatively conserved with 3 SNPs. The minor allele frequency of the SNPs ranged between 1.9% and 21.2%. Ten high-frequency alleles (>5%) were observed in PfRh5 at codons 147, 148, 277, 410, and 429 and in PfRipr at codons 190, 255, 259, and 1003. A SNP was located in protein-protein interaction region C203Y and F292V of PfRh5 and PfCyRPA, respectively. Put together, this study revealed low polymorphisms in the PfRh5 invasion complex in the Lake Victoria parasite population. However, the two mutations identified on the protein interaction regions prompts for investigation on their impacts on parasite invasion process to support the consideration of PfRh5 components as potential malaria vaccine candidates.
RESUMO
Under-utilised orphan crops hold the key to diversified and climate-resilient food systems. Here, we report on orphan crop genomics using the case of Lablab purpureus (L.) Sweet (lablab) - a legume native to Africa and cultivated throughout the tropics for food and forage. Our Africa-led plant genome collaboration produces a high-quality chromosome-scale assembly of the lablab genome. Our assembly highlights the genome organisation of the trypsin inhibitor genes - an important anti-nutritional factor in lablab. We also re-sequence cultivated and wild lablab accessions from Africa confirming two domestication events. Finally, we examine the genetic and phenotypic diversity in a comprehensive lablab germplasm collection and identify genomic loci underlying variation of important agronomic traits in lablab. The genomic data generated here provide a valuable resource for lablab improvement. Our inclusive collaborative approach also presents an example that can be explored by other researchers sequencing indigenous crops, particularly from low and middle-income countries (LMIC).