RESUMEN
A personalized approach based on a patient's or pathogen's unique genomic sequence is the foundation of precision medicine. Genomic findings must be robust and reproducible, and experimental data capture should adhere to findable, accessible, interoperable, and reusable (FAIR) guiding principles. Moreover, effective precision medicine requires standardized reporting that extends beyond wet-lab procedures to computational methods. The BioCompute framework (https://w3id.org/biocompute/1.3.0) enables standardized reporting of genomic sequence data provenance, including provenance domain, usability domain, execution domain, verification kit, and error domain. This framework facilitates communication and promotes interoperability. Bioinformatics computation instances that employ the BioCompute framework are easily relayed, repeated if needed, and compared by scientists, regulators, test developers, and clinicians. Easing the burden of performing the aforementioned tasks greatly extends the range of practical application. Large clinical trials, precision medicine, and regulatory submissions require a set of agreed upon standards that ensures efficient communication and documentation of genomic analyses. The BioCompute paradigm and the resulting BioCompute Objects (BCOs) offer that standard and are freely accessible as a GitHub organization (https://github.com/biocompute-objects) following the "Open-Stand.org principles for collaborative open standards development." With high-throughput sequencing (HTS) studies communicated using a BCO, regulatory agencies (e.g., Food and Drug Administration [FDA]), diagnostic test developers, researchers, and clinicians can expand collaboration to drive innovation in precision medicine, potentially decreasing the time and cost associated with next-generation sequencing workflow exchange, reporting, and regulatory reviews.
Asunto(s)
Biología Computacional/métodos , Análisis de Secuencia de ADN/métodos , Animales , Comunicación , Biología Computacional/normas , Genoma , Genómica/métodos , Secuenciación de Nucleótidos de Alto Rendimiento , Humanos , Medicina de Precisión/tendencias , Reproducibilidad de los Resultados , Análisis de Secuencia de ADN/normas , Programas Informáticos , Flujo de TrabajoRESUMEN
MOTIVATION: Given the abundance of genome sequencing and omics data, an opprtunity and challenge in bioinformatics relates to data mining and visualization. The majority of current bioinformatics visualizations are implemented either as multi-tier web server applications that require significant maintenance effort, or as client software that presumes technical expertise for installation. Here we present the Visual Omics Explorer (VOE), a cross-platform data visualization portal that is implemented using only HTML and Javascript code. VOE is a standalone software that can be loaded offline on the web browser from a local copy of the code, or over the internet without any dependency other than distributing the code through a file sharing service. VOE can interactively display genomics, transcriptomics, epigenomics and metagenomics data stored either locally or retrieved from cloud storage services, and runs on both desktop computers and mobile devices. AVAILABILITY AND IMPLEMENTATION: VOE is accessible at http://bcil.github.io/VOE/ CONTACT: agbiotec@gmail.com SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
Asunto(s)
Biología Computacional/métodos , Gráficos por Computador , Genómica/métodos , Programas Informáticos , Epigenómica/métodos , Humanos , Internet , Metagenómica/métodos , Transcriptoma , Navegador WebRESUMEN
Prostate cancer (PCa) is frequently diagnosed in men, and dysregulation of microRNAs is characteristic of many cancers. MicroRNA-1207-3p is encoded at the non-protein coding gene locus PVT1 on the 8q24 human chromosomal region, an established PCa susceptibility locus. However, the role of microRNA-1207-3p in PCa is unclear. We discovered that microRNA-1207-3p is significantly underexpressed in PCa cell lines in comparison to normal prostate epithelial cells. Increased expression of microRNA-1207-3p in PCa cells significantly inhibits proliferation, migration, and induces apoptosis via direct molecular targeting of FNDC1, a protein which contains a conserved protein domain of fibronectin (FN1). FNDC1, FN1, and the androgen receptor (AR) are significantly overexpressed in PCa cell lines and human PCa, and positively correlate with aggressive PCa. Prostate tumor FN1 expression in patients that experienced PCa-specific death is significantly higher than in patients that remained alive. Furthermore, FNDC1, FN1 and AR are concomitantly overexpressed in metastatic PCa. Consequently, these studies have revealed a novel microRNA-1207-3p/FNDC1/FN1/AR regulatory pathway in PCa.
Asunto(s)
Fibronectinas/metabolismo , Regulación Neoplásica de la Expresión Génica , MicroARNs/metabolismo , Proteínas de Neoplasias/metabolismo , Neoplasias de la Próstata/genética , Receptores Androgénicos/genética , Apoptosis/genética , Línea Celular Tumoral , Movimiento Celular/genética , Proliferación Celular , Fibronectinas/genética , Humanos , Masculino , MicroARNs/genética , Invasividad Neoplásica , Metástasis de la Neoplasia , Proteínas de Neoplasias/genética , Neoplasias de la Próstata/patología , Receptores Androgénicos/metabolismo , Regulación hacia Arriba/genéticaRESUMEN
The Arabidopsis Information Portal (https://www.araport.org) is a new online resource for plant biology research. It houses the Arabidopsis thaliana genome sequence and associated annotation. It was conceived as a framework that allows the research community to develop and release 'modules' that integrate, analyze and visualize Arabidopsis data that may reside at remote sites. The current implementation provides an indexed database of core genomic information. These data are made available through feature-rich web applications that provide search, data mining, and genome browser functionality, and also by bulk download and web services. Araport uses software from the InterMine and JBrowse projects to expose curated data from TAIR, GO, BAR, EBI, UniProt, PubMed and EPIC CoGe. The site also hosts 'science apps,' developed as prototypes for community modules that use dynamic web pages to present data obtained on-demand from third-party servers via RESTful web services. Designed for sustainability, the Arabidopsis Information Portal strategy exploits existing scientific computing infrastructure, adopts a practical mixture of data integration technologies and encourages collaborative enhancement of the resource by its user community.
Asunto(s)
Arabidopsis/genética , Bases de Datos Genéticas , Genoma de Planta , Minería de Datos , Internet , Programas InformáticosRESUMEN
BACKGROUND: Root system architecture is important for water acquisition and nutrient acquisition for all crops. In soybean breeding programs, wild soybean alleles have been used successfully to enhance yield and seed composition traits, but have never been investigated to improve root system architecture. Therefore, in this study, high-density single-feature polymorphic markers and simple sequence repeats were used to map quantitative trait loci (QTLs) governing root system architecture in an inter-specific soybean mapping population developed from a cross between Glycine max and Glycine soja. RESULTS: Wild and cultivated soybean both contributed alleles towards significant additive large effect QTLs on chromosome 6 and 7 for a longer total root length and root distribution, respectively. Epistatic effect QTLs were also identified for taproot length, average diameter, and root distribution. These root traits will influence the water and nutrient uptake in soybean. Two cell division-related genes (D type cyclin and auxin efflux carrier protein) with insertion/deletion variations might contribute to the shorter root phenotypes observed in G. soja compared with cultivated soybean. Based on the location of the QTLs and sequence information from a second G. soja accession, three genes (slow anion channel associated 1 like, Auxin responsive NEDD8-activating complex and peroxidase), each with a non-synonymous single nucleotide polymorphism mutation were identified, which may also contribute to changes in root architecture in the cultivated soybean. In addition, Apoptosis inhibitor 5-like on chromosome 7 and slow anion channel associated 1-like on chromosome 15 had epistatic interactions for taproot length QTLs in soybean. CONCLUSION: Rare alleles from a G. soja accession are expected to enhance our understanding of the genetic components involved in root architecture traits, and could be combined to improve root system and drought adaptation in soybean.
Asunto(s)
Mapeo Cromosómico , Glycine max/genética , Raíces de Plantas/genética , Alelos , Genoma de Planta , Raíces de Plantas/crecimiento & desarrollo , Polimorfismo de Nucleótido Simple/genética , Sitios de Carácter Cuantitativo/genética , Glycine max/crecimiento & desarrolloRESUMEN
BACKGROUND: Next-generation sequencing (NGS) technologies have resulted in petabytes of scattered data, decentralized in archives, databases and sometimes in isolated hard-disks which are inaccessible for browsing and analysis. It is expected that curated secondary databases will help organize some of this Big Data thereby allowing users better navigate, search and compute on it. RESULTS: To address the above challenge, we have implemented a NGS biocuration workflow and are analyzing short read sequences and associated metadata from cancer patients to better understand the human variome. Curation of variation and other related information from control (normal tissue) and case (tumor) samples will provide comprehensive background information that can be used in genomic medicine research and application studies. Our approach includes a CloudBioLinux Virtual Machine which is used upstream of an integrated High-performance Integrated Virtual Environment (HIVE) that encapsulates Curated Short Read archive (CSR) and a proteome-wide variation effect analysis tool (SNVDis). As a proof-of-concept, we have curated and analyzed control and case breast cancer datasets from the NCI cancer genomics program - The Cancer Genome Atlas (TCGA). Our efforts include reviewing and recording in CSR available clinical information on patients, mapping of the reads to the reference followed by identification of non-synonymous Single Nucleotide Variations (nsSNVs) and integrating the data with tools that allow analysis of effect nsSNVs on the human proteome. Furthermore, we have also developed a novel phylogenetic analysis algorithm that uses SNV positions and can be used to classify the patient population. The workflow described here lays the foundation for analysis of short read sequence data to identify rare and novel SNVs that are not present in dbSNP and therefore provides a more comprehensive understanding of the human variome. Variation results for single genes as well as the entire study are available from the CSR website (http://hive.biochemistry.gwu.edu/dna.cgi?cmd=csr). CONCLUSIONS: Availability of thousands of sequenced samples from patients provides a rich repository of sequence information that can be utilized to identify individual level SNVs and their effect on the human proteome beyond what the dbSNP database provides.
Asunto(s)
Secuenciación de Nucleótidos de Alto Rendimiento/métodos , Neoplasias/genética , Proteoma/genética , Proteómica/métodos , Algoritmos , Investigación Biomédica , Sistemas de Administración de Bases de Datos , Bases de Datos Genéticas , Humanos , Neoplasias/metabolismo , Filogenia , Polimorfismo de Nucleótido Simple , Proteoma/clasificación , Proteoma/metabolismo , Interfaz Usuario-ComputadorRESUMEN
BACKGROUND: Understanding the taxonomic composition of a sample, whether from patient, food or environment, is important to several types of studies including pathogen diagnostics, epidemiological studies, biodiversity analysis and food quality regulation. With the decreasing costs of sequencing, metagenomic data is quickly becoming the preferred typed of data for such analysis. RESULTS: Rapidly defining the taxonomic composition (both taxonomic profile and relative frequency) in a metagenomic sequence dataset is challenging because the task of mapping millions of sequence reads from a metagenomic study to a non-redundant nucleotide database such as the NCBI non-redundant nucleotide database (nt) is a computationally intensive task. We have developed a robust subsampling-based algorithm implemented in a tool called CensuScope meant to take a 'sneak peak' into the population distribution and estimate taxonomic composition as if a census was taken of the metagenomic landscape. CensuScope is a rapid and accurate metagenome taxonomic profiling tool that randomly extracts a small number of reads (based on user input) and maps them to NCBI's nt database. This process is repeated multiple times to ascertain the taxonomic composition that is found in majority of the iterations, thereby providing a robust estimate of the population and measures of the accuracy for the results. CONCLUSION: CensuScope can be run on a laptop or on a high-performance computer. Based on our analysis we are able to provide some recommendations in terms of the number of sequence reads to analyze and the number of iterations to use. For example, to quantify taxonomic groups present in the sample at a level of 1% or higher a subsampling size of 250 random reads with 50 iterations yields a statistical power of >99%. Windows and UNIX versions of CensuScope are available for download at https://hive.biochemistry.gwu.edu/dna.cgi?cmd=censuscope. CensuScope is also available through the High-performance Integrated Virtual Environment (HIVE) and can be used in conjunction with other HIVE analysis and visualization tools.
Asunto(s)
Clasificación/métodos , Metagenoma , Biodiversidad , Bases de Datos Genéticas , Genoma Fúngico/genética , Humanos , Intestinos/microbiología , Microbiota/genética , Infecciones del Sistema Respiratorio/microbiología , Factores de TiempoRESUMEN
BACKGROUND: A steep drop in the cost of next-generation sequencing during recent years has made the technology affordable to the majority of researchers, but downstream bioinformatic analysis still poses a resource bottleneck for smaller laboratories and institutes that do not have access to substantial computational resources. Sequencing instruments are typically bundled with only the minimal processing and storage capacity required for data capture during sequencing runs. Given the scale of sequence datasets, scientific value cannot be obtained from acquiring a sequencer unless it is accompanied by an equal investment in informatics infrastructure. RESULTS: Cloud BioLinux is a publicly accessible Virtual Machine (VM) that enables scientists to quickly provision on-demand infrastructures for high-performance bioinformatics computing using cloud platforms. Users have instant access to a range of pre-configured command line and graphical software applications, including a full-featured desktop interface, documentation and over 135 bioinformatics packages for applications including sequence alignment, clustering, assembly, display, editing, and phylogeny. Each tool's functionality is fully described in the documentation directly accessible from the graphical interface of the VM. Besides the Amazon EC2 cloud, we have started instances of Cloud BioLinux on a private Eucalyptus cloud installed at the J. Craig Venter Institute, and demonstrated access to the bioinformatic tools interface through a remote connection to EC2 instances from a local desktop computer. Documentation for using Cloud BioLinux on EC2 is available from our project website, while a Eucalyptus cloud image and VirtualBox Appliance is also publicly available for download and use by researchers with access to private clouds. CONCLUSIONS: Cloud BioLinux provides a platform for developing bioinformatics infrastructures on the cloud. An automated and configurable process builds Virtual Machines, allowing the development of highly customized versions from a shared code base. This shared community toolkit enables application specific analysis platforms on the cloud by minimizing the effort required to prepare and maintain them.
Asunto(s)
Metodologías Computacionales , Genómica/métodos , Animales , Computadores , Humanos , Alineación de Secuencia , Programas InformáticosRESUMEN
Animal venoms are among the most complex natural secretions known, comprising a mixture of bioactive compounds often referred to as toxins. Venom arsenals are predominately made up of cysteine-rich peptide toxins that manipulate molecular targets, such as ion channels and receptors, making these venom peptides attractive candidates for the development of therapeutics to benefit human health. With the rise of omic strategies that utilize transcriptomic, proteomic, and bioinformatic methods, we are able to identify more venom proteins and peptides than ever before. However, identification and characterization of bioactive venom peptides remains a significant challenge due to the unique chemical structure and enormous number of peptides found in each venom arsenal (upward of 200 per organism). Here, we introduce a rapid and user-friendly in silico bioinformatic pipeline for the de novo identification and characterization of raw RNAseq reads from venom glands to elucidate cysteine-rich peptides from the arsenal of venomous organisms.Implementation: This project develops a user-friendly automated bioinformatics pipeline via a Galaxy workflow to identify novel venom peptides from raw RNAseq reads of terebrid snails. While designed for venomous terebrid snails, with minor adjustments, this pipeline can be made universal to identify secreted disulfide-rich peptide toxins from any venomous organism.
Asunto(s)
Toxinas Biológicas , Ponzoñas , Animales , Biología Computacional , Cisteína , Disulfuros , Péptidos/química , Proteómica , Caracoles , Toxinas Biológicas/genética , Ponzoñas/genéticaRESUMEN
Carnivores are currently colonizing cities where they were previously absent. These urban environments are novel ecosystems characterized by habitat degradation and fragmentation, availability of human food, and different prey assemblages than surrounding areas. Coyotes (Canis latrans) established a breeding population in New York City (NYC) over the last few decades, but their ecology within NYC is poorly understood. In this study, we used non-invasive scat sampling and DNA metabarcoding to profile vertebrate, invertebrate, and plant dietary items with the goal to compare the diets of urban coyotes to those inhabiting non-urban areas. We found that both urban and non-urban coyotes consumed a variety of plants and animals as well as human food. Raccoons (Procyon lotor) were an important food item for coyotes within and outside NYC. In contrast, white-tailed deer (Odocoileus virginianus) were mainly eaten by coyotes inhabiting non-urban areas. Domestic chicken (Gallus gallus) was the human food item found in most scats from both urban and non-urban coyotes. Domestic cats (Felis catus) were consumed by urban coyotes but were detected in only a small proportion of the scats (<5%), which differs markedly from high rates of cat depredation in some other cities. In addition, we compared our genetic metabarcoding analysis to a morphological analysis of the same scat samples. We found that the detection similarity between the two methods was low and it varied depending on the type of diet item.
Asunto(s)
Carnívoros , Coyotes , Ciervos , Humanos , Animales , Gatos , Coyotes/genética , Ciudad de Nueva York , Ecosistema , Código de Barras del ADN TaxonómicoRESUMEN
Nongenetic predisposition to colorectal cancer continues to be difficult to measure precisely, hampering efforts in targeted prevention and screening. Epigenetic changes in the normal mucosa of patients with colorectal cancer can serve as a tool in predicting colorectal cancer outcomes. We identified epigenetic changes affecting the normal mucosa of patients with colorectal cancer. DNA methylation profiling on normal colon mucosa from 77 patients with colorectal cancer and 68 controls identified a distinct subgroup of normally-appearing mucosa with markedly disrupted DNA methylation at a large number of CpGs, termed as "Outlier Methylation Phenotype" (OMP) and are present in 15 of 77 patients with cancer versus 0 of 68 controls (P < 0.001). Similar findings were also seen in publicly available datasets. Comparison of normal colon mucosa transcription profiles of patients with OMP cancer with those of patients with non-OMP cancer indicates genes whose promoters are hypermethylated in the OMP patients are also transcriptionally downregulated, and that many of the genes most affected are involved in interactions between epithelial cells, the mucus layer, and the microbiome. Analysis of 16S rRNA profiles suggests that normal colon mucosa of OMPs are enriched in bacterial genera associated with colorectal cancer risk, advanced tumor stage, chronic intestinal inflammation, malignant transformation, nosocomial infections, and KRAS mutations. In conclusion, our study identifies an epigenetically distinct OMP group in the normal mucosa of patients with colorectal cancer that is characterized by a disrupted methylome, altered gene expression, and microbial dysbiosis. Prospective studies are needed to determine whether OMP could serve as a biomarker for an elevated epigenetic risk for colorectal cancer development. PREVENTION RELEVANCE: Our study identifies an epigenetically distinct OMP group in the normal mucosa of patients with colorectal cancer that is characterized by a disrupted methylome, altered gene expression, and microbial dysbiosis. Identification of OMPs in healthy controls and patients with colorectal cancer will lead to prevention and better prognosis, respectively.
Asunto(s)
Neoplasias Colorrectales , Epigenoma , Humanos , Disbiosis/complicaciones , Disbiosis/genética , Disbiosis/metabolismo , ARN Ribosómico 16S/genética , Metilación de ADN , Epigénesis Genética , Mucosa Intestinal/patología , Neoplasias Colorrectales/patologíaRESUMEN
Suitable habitat fragment size, isolation, and distance from a source are important variables influencing community composition of plants and animals, but the role of these environmental factors in determining composition and variation of host-associated microbial communities is poorly known. In parasite-associated microbial communities, it is hypothesized that evolution and ecology of an arthropod parasite will influence its microbiome more than broader environmental factors, but this hypothesis has not been extensively tested. To examine the influence of the broader environment on the parasite microbiome, we applied high-throughput sequencing of the V4 region of 16S rRNA to characterize the microbiome of 222 obligate ectoparasitic bat flies (Streblidae and Nycteribiidae) collected from 155 bats (representing six species) from ten habitat fragments in the Atlantic Forest of Brazil. Parasite species identity is the strongest driver of microbiome composition. To a lesser extent, reduction in habitat fragment area, but not isolation, is associated with an increase in connectance and betweenness centrality of bacterial association networks driven by changes in the diversity of the parasite community. Controlling for the parasite community, bacterial network topology covaries with habitat patch area and exhibits parasite-species specific responses to environmental change. Taken together, habitat loss may have cascading consequences for communities of interacting macro- and microorgansims.
RESUMEN
BACKGROUND: Animals evolved in a microbial world, and their gut microbial symbionts have played a role in their ecological diversification. While many recent studies report patterns of phylosymbiosis between hosts and their gut bacteria, fewer studies examine the potentially adaptive functional contributions of these microbes to the dietary habits of their hosts. In this study, we examined predicted metabolic pathways in the gut bacteria of more than 500 individual bats belonging to 60 species and compare the enrichment of these functions across hosts with distinct dietary ecologies. RESULTS: We found that predicted microbiome functions were differentially enriched across hosts with different diets. Using a machine-learning approach, we also found that inferred microbiome functions could be used to predict specialized host diets with reasonable accuracy. We detected a relationship between both host phylogeny and diet with respect to microbiome functional repertoires. Because many predicted functions could potentially fill nutritional gaps for bats with specialized diets, we considered pathways discriminating dietary niches as traits of the host and fit them to comparative phylogenetic models of evolution. Our results suggest that some, but not all, predicted microbiome functions may evolve toward adaptive optima and thus be visible to the forces of natural selection operating on hosts over evolutionary time. CONCLUSIONS: Our results suggest that bats with specialized diets may partially rely on their gut microbes to fulfill or augment critical nutritional pathways, including essential amino acid synthesis, fatty acid biosynthesis, and the generation of cofactors and vitamins essential for proper nutrition. Our work adds to a growing body of literature suggesting that animal microbiomes are structured by a combination of ecological and evolutionary processes and sets the stage for future metagenomic and metabolic characterization of the bat microbiome to explore links between bacterial metabolism and host nutrition.
RESUMEN
Bat communities in the Neotropics are some of the most speciose assemblages of mammals on Earth, with regions supporting more than 100 sympatric species with diverse feeding ecologies. Because bats are small, nocturnal, and volant, it is difficult to directly observe their feeding habits, which has resulted in their classification into broadly defined dietary guilds (e.g., insectivores, carnivores, and frugivores). Apart from these broad guilds, we lack detailed dietary information for many species and therefore have only a limited understanding of interaction networks linking bats and their diet items. In this study, we used DNA metabarcoding of plants, arthropods, and vertebrates to investigate the diets of 25 bat species from the tropical dry forests of Lamanai, Belize. Our results report some of the first detection of diet items for the focal bat taxa, adding rich and novel natural history information to the field of bat ecology. This study represents a comprehensive first effort to apply DNA metabarcoding to bat diets at Lamanai and provides a useful methodological framework for future studies testing hypotheses about coexistence and niche differentiation in the context of modern high-throughput molecular data.
RESUMEN
High mortality rates of prostate cancer (PCa) are associated with metastatic castration-resistant prostate cancer (CRPC) due to the maintenance of androgen receptor (AR) signaling despite androgen deprivation therapies (ADTs). The 8q24 chromosomal locus is a region of very high PCa susceptibility that carries genetic variants associated with high risk of PCa incidence. This region also carries frequent amplifications of the PVT1 gene, a non-protein coding gene that encodes a cluster of microRNAs including, microRNA-1205 (miR-1205), which are largely understudied. Herein, we demonstrate that miR-1205 is underexpressed in PCa cells and tissues and suppresses CRPC tumors in vivo. To characterize the molecular pathway, we identified and validated fry-like (FRYL) as a direct molecular target of miR-1205 and observed its overexpression in PCa cells and tissues. FRYL is predicted to regulate dendritic branching, which led to the investigation of FRYL in neuroendocrine PCa (NEPC). Resistance toward ADT leads to the progression of treatment related NEPC often characterized by PCa neuroendocrine differentiation (NED), however, this mechanism is poorly understood. Underexpression of miR-1205 is observed when NED is induced in vitro and inhibition of miR-1205 leads to increased expression of NED markers. However, while FRYL is overexpressed during NED, FRYL knockdown did not reduce NED, therefore revealing that miR-1205 induces NED independently of FRYL.
RESUMEN
BACKGROUND: High throughput methods, such as high density oligonucleotide microarray measurements of mRNA levels, are popular and critical to genome scale analysis and systems biology. However understanding the results of these analyses and in particular understanding the very wide range of levels of transcriptional changes observed is still a significant challenge. Many researchers still use an arbitrary cut off such as two-fold in order to identify changes that may be biologically significant. We have used a very large-scale microarray experiment involving 72 biological replicates to analyze the response of soybean plants to infection by the pathogen Phytophthora sojae and to analyze transcriptional modulation as a result of genotypic variation. RESULTS: With the unprecedented level of statistical sensitivity provided by the high degree of replication, we show unambiguously that almost the entire plant genome (97 to 99% of all detectable genes) undergoes transcriptional modulation in response to infection and genetic variation. The majority of the transcriptional differences are less than two-fold in magnitude. We show that low amplitude modulation of gene expression (less than two-fold changes) is highly statistically significant and consistent across biological replicates, even for modulations of less than 20%. Our results are consistent through two different normalization methods and two different statistical analysis procedures. CONCLUSION: Our findings demonstrate that the entire plant genome undergoes transcriptional modulation in response to infection and genetic variation. The pervasive low-magnitude remodeling of the transcriptome may be an integral component of physiological adaptation in soybean, and in all eukaryotes.
Asunto(s)
Perfilación de la Expresión Génica , Genoma de Planta , Glycine max/genética , Phytophthora/patogenicidad , Regulación de la Expresión Génica de las Plantas , Genes de Plantas , Genotipo , Interacciones Huésped-Patógeno , Modelos Lineales , Análisis de Secuencia por Matrices de Oligonucleótidos/métodos , Enfermedades de las Plantas/genética , ARN de Planta/genética , Sensibilidad y Especificidad , Glycine max/metabolismo , Glycine max/microbiologíaRESUMEN
The availability of low-cost small-factor sequencers, such as the Illumina MiSeq, MiniSeq, or iSeq, have paved the way for democratizing genomics sequencing, providing researchers in minority universities with access to the technology that was previously only affordable by institutions with large core facilities. However, these instruments are not bundled with software for performing bioinformatics data analysis, and the data analysis can be the main bottleneck for independent laboratories or even small clinical facilities that consider adopting genomic sequencing for medical applications. To address this issue, we have developed miCloud, a bioinformatics platform that enables genomic data analysis through a fully featured data analysis cloud, which seamlessly integrates with genome sequencers over the local network. The miCloud can be easily deployed without any prior bioinformatics expertise on any computing environment, from a laboratory computer workstation to a university computer cluster. Our platform not only provides access to a set of preconfigured RNA-Seq and CHIP-Seq bioinformatics pipelines, but also enables users to develop or install new preconfigured tools from the large selection available on open-source online Docker container repositories. The miCloud built-in analysis pipelines are also integrated with the Visual Omics Explorer framework (Kim et al., 2016), which provides rich interactive visualizations and publication-ready graphics from the next-generation sequencing data. Ultimately, the miCloud demonstrates a bioinformatics approach that can be adopted in the field for standardizing genomic data analysis, similarly to the way molecular biology sample preparation kits have standardized laboratory operations.
Asunto(s)
Nube Computacional , Genómica/métodos , RNA-Seq/métodos , Programas Informáticos , Animales , HumanosRESUMEN
BACKGROUND: Current methods used for annotating metagenomics shotgun sequencing (MGS) data rely on a computationally intensive and low-stringency approach of mapping each read to a generic database of proteins or reference microbial genomes. RESULTS: We developed MGS-Fast, an analysis approach for shotgun whole-genome metagenomic data utilizing Bowtie2 DNA-DNA alignment of reads that is an alternative to using the integrated catalog of reference genes database of well-annotated genes compiled from human microbiome data. This method is rapid and provides high-stringency matches (>90% DNA sequence identity) of the metagenomics reads to genes with annotated functions. We demonstrate the use of this method with data from a study of liver disease and synthetic reads, and Human Microbiome Project shotgun data, to detect differentially abundant Kyoto Encyclopedia of Genes and Genomes gene functions in these experiments. This rapid annotation method is freely available as a Galaxy workflow within a Docker image. CONCLUSIONS: MGS-Fast can confidently transfer functional annotations from gene databases to metagenomic reads, with speed and accuracy.
Asunto(s)
Biología Computacional/métodos , Metagenómica/métodos , Programas Informáticos , Algoritmos , Nube Computacional , Humanos , Metagenoma , Microbiología , Microbiota , Anotación de Secuencia Molecular , Reproducibilidad de los Resultados , Flujo de TrabajoRESUMEN
A comprehensive knowledge of the types and ratios of microbes that inhabit the healthy human gut is necessary before any kind of pre-clinical or clinical study can be performed that attempts to alter the microbiome to treat a condition or improve therapy outcome. To address this need we present an innovative scalable comprehensive analysis workflow, a healthy human reference microbiome list and abundance profile (GutFeelingKB), and a novel Fecal Biome Population Report (FecalBiome) with clinical applicability. GutFeelingKB provides a list of 157 organisms (8 phyla, 18 classes, 23 orders, 38 families, 59 genera and 109 species) that forms the baseline biome and therefore can be used as healthy controls for studies related to dysbiosis. This list can be expanded to 863 organisms if closely related proteomes are considered. The incorporation of microbiome science into routine clinical practice necessitates a standard report for comparison of an individual's microbiome to the growing knowledgebase of "normal" microbiome data. The FecalBiome and the underlying technology of GutFeelingKB address this need. The knowledgebase can be useful to regulatory agencies for the assessment of fecal transplant and other microbiome products, as it contains a list of organisms from healthy individuals. In addition to the list of organisms and their abundances, this study also generated a collection of assembled contiguous sequences (contigs) of metagenomics dark matter. In this study, metagenomic dark matter represents sequences that cannot be mapped to any known sequence but can be assembled into contigs of 10,000 nucleotides or higher. These sequences can be used to create primers to study potential novel organisms. All data is freely available from https://hive.biochemistry.gwu.edu/gfkb and NCBI's Short Read Archive.
Asunto(s)
Microbioma Gastrointestinal , Metagenoma , Metagenómica , Heces/microbiología , Humanos , Metagenómica/métodosRESUMEN
The gut microbiome is a community of host-associated symbiotic microbes that fulfills multiple key roles in host metabolism, immune function, and tissue development. Given the ability of the microbiome to impact host fitness, there is increasing interest in studying the microbiome of wild animals to better understand these communities in the context of host ecology and evolution. Human microbiome research protocols are well established, but wildlife microbiome research is still a developing field. Currently, there is no standardized set of best practices guiding the collection of microbiome samples from wildlife. Gut microflora are typically sampled either by fecal collection, rectal swabbing, or by destructively sampling the intestinal contents of the host animal. Studies rarely include more than one sampling technique and no comparison of these methods currently exists for a wild mammal. Although some studies have hypothesized that the fecal microbiome is a nested subset of the intestinal microbiome, this hypothesis has not been formally tested. To address these issues, we examined guano (feces) and distal intestinal mucosa from 19 species of free-ranging bats from Lamanai, Belize, using 16S rRNA amplicon sequencing to compare microbial communities across sample types. We found that the diversity and composition of intestine and guano samples differed substantially. In addition, we conclude that signatures of host evolution are retained by studying gut microbiomes based on mucosal tissue samples, but not fecal samples. Conversely, fecal samples retained more signal of host diet than intestinal samples. These results suggest that fecal and intestinal sampling methods are not interchangeable, and that these two microbiotas record different information about the host from which they are isolated.