Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 11 de 11
Filtrar
1.
Nucleic Acids Res ; 49(D1): D792-D802, 2021 01 08.
Artículo en Inglés | MEDLINE | ID: mdl-32735679

RESUMEN

In recent years, large-scale oceanic sequencing efforts have provided a deeper understanding of marine microbial communities and their dynamics. These research endeavors require the acquisition of complex and varied datasets through large, interdisciplinary and collaborative efforts. However, no unifying framework currently exists for the marine science community to integrate sequencing data with physical, geological, and geochemical datasets. Planet Microbe is a web-based platform that enables data discovery from curated historical and on-going oceanographic sequencing efforts. In Planet Microbe, each 'omics sample is linked with other biological and physiochemical measurements collected for the same water samples or during the same sample collection event, to provide a broader environmental context. This work highlights the need for curated aggregation efforts that can enable new insights into high-quality metagenomic datasets. Planet Microbe is freely accessible from https://www.planetmicrobe.org/.


Asunto(s)
Organismos Acuáticos/microbiología , Análisis de Datos , Ambiente , Metagenómica , Planetas , Bases de Datos Genéticas , Estándares de Referencia , Interfaz Usuario-Computador
2.
Bioinformatics ; 33(4): 552-554, 2017 02 15.
Artículo en Inglés | MEDLINE | ID: mdl-27794557

RESUMEN

Summary: Following polyploidy events, genomes undergo massive reduction in gene content through a process known as fractionation. Importantly, the fractionation process is not always random, and a bias as to which homeologous chromosome retains or loses more genes can be observed in some species. The process of characterizing whole genome fractionation requires identifying syntenic regions across genomes followed by post-processing of those syntenic datasets to identify and plot gene retention patterns. We have developed a tool, FractBias, to calculate and visualize gene retention and fractionation patterns across whole genomes. Through integration with SynMap and its parent platform CoGe, assembled genomes are pre-loaded and available for analysis, as well as letting researchers integrate their own data with security options to keep them private or make them publicly available. Availability and Implementation: FractBias is freely available as a web application at https://genomevolution.org/CoGe/SynMap.pl . The software is open source (MIT license) and executable with Python 2.7 or iPython notebook, and available on GitHub ( https://goo.gl/PaAtqy ). Documentation for FractBias is available on CoGepedia ( https://goo.gl/ou9dt6 ). Contact: ericlyons@email.arizona.edu. Supplementary information: Supplementary data are available at Bioinformatics online.


Asunto(s)
Evolución Molecular , Genoma de Planta , Genómica/métodos , Poliploidía , Programas Informáticos , Genes de Plantas , Plantas/genética , Análisis de Secuencia de ADN/métodos
3.
Nucleic Acids Res ; 39(10): e68, 2011 May.
Artículo en Inglés | MEDLINE | ID: mdl-21398631

RESUMEN

SyMAP (Synteny Mapping and Analysis Program) was originally developed to compute synteny blocks between a sequenced genome and a FPC map, and has been extended to support pairs of sequenced genomes. SyMAP uses MUMmer to compute the raw hits between the two genomes, which are then clustered and filtered using the optional gene annotation. The filtered hits are input to the synteny algorithm, which was designed to discover duplicated regions and form larger-scale synteny blocks, where intervening micro-rearrangements are allowed. SyMAP provides extensive interactive Java displays at all levels of resolution along with simultaneous displays of multiple aligned pairs. The synteny blocks from multiple chromosomes may be displayed in a high-level dot plot or three-dimensional view, and the user may then drill down to see the details of a region, including the alignments of the hits to the gene annotation. These capabilities are illustrated by showing their application to the study of genome duplication, differential gene loss and transitive homology between sorghum, maize and rice. The software may be used from a website or standalone for the best performance. A project manager is provided to organize and automate the analysis of multi-genome groups. The software is freely distributed at http://www.agcol.arizona.edu/software/symap.


Asunto(s)
Genoma de Planta , Programas Informáticos , Sintenía , Cromosomas de las Plantas , Gráficos por Computador , Genómica/métodos
4.
PLoS Genet ; 5(11): e1000740, 2009 Nov.
Artículo en Inglés | MEDLINE | ID: mdl-19936069

RESUMEN

Full-length cDNA (FLcDNA) sequencing establishes the precise primary structure of individual gene transcripts. From two libraries representing 27 B73 tissues and abiotic stress treatments, 27,455 high-quality FLcDNAs were sequenced. The average transcript length was 1.44 kb including 218 bases and 321 bases of 5' and 3' UTR, respectively, with 8.6% of the FLcDNAs encoding predicted proteins of fewer than 100 amino acids. Approximately 94% of the FLcDNAs were stringently mapped to the maize genome. Although nearly two-thirds of this genome is composed of transposable elements (TEs), only 5.6% of the FLcDNAs contained TE sequences in coding or UTR regions. Approximately 7.2% of the FLcDNAs are putative transcription factors, suggesting that rare transcripts are well-enriched in our FLcDNA set. Protein similarity searching identified 1,737 maize transcripts not present in rice, sorghum, Arabidopsis, or poplar annotated genes. A strict FLcDNA assembly generated 24,467 non-redundant sequences, of which 88% have non-maize protein matches. The FLcDNAs were also assembled with 41,759 FLcDNAs in GenBank from other projects, where semi-strict parameters were used to identify 13,368 potentially unique non-redundant sequences from this project. The libraries, ESTs, and FLcDNA sequences produced from this project are publicly available. The annotated EST and FLcDNA assemblies are available through the maize FLcDNA web resource (www.maizecdna.org).


Asunto(s)
Mapeo Cromosómico/métodos , ADN Complementario/genética , Análisis de Secuencia de ADN/métodos , Zea mays/genética , Arabidopsis/genética , Secuencia de Bases , Cromosomas de las Plantas/genética , Mapeo Contig , Elementos Transponibles de ADN/genética , Etiquetas de Secuencia Expresada , Genes de Plantas/genética , Internet , Repeticiones de Minisatélite/genética , Datos de Secuencia Molecular , Oryza/genética , Proteínas de Plantas/metabolismo , Poli A/genética , Polimorfismo de Nucleótido Simple/genética , Populus/genética , Homología de Secuencia de Ácido Nucleico , Sorghum/genética , Factores de Transcripción/genética
5.
Front Microbiol ; 12: 765268, 2021.
Artículo en Inglés | MEDLINE | ID: mdl-34956127

RESUMEN

Marine microbial ecology requires the systematic comparison of biogeochemical and sequence data to analyze environmental influences on the distribution and variability of microbial communities. With ever-increasing quantities of metagenomic data, there is a growing need to make datasets Findable, Accessible, Interoperable, and Reusable (FAIR) across diverse ecosystems. FAIR data is essential to developing analytical frameworks that integrate microbiological, genomic, ecological, oceanographic, and computational methods. Although community standards defining the minimal metadata required to accompany sequence data exist, they haven't been consistently used across projects, precluding interoperability. Moreover, these data are not machine-actionable or discoverable by cyberinfrastructure systems. By making 'omic and physicochemical datasets FAIR to machine systems, we can enable sequence data discovery and reuse based on machine-readable descriptions of environments or physicochemical gradients. In this work, we developed a novel technical specification for dataset encapsulation for the FAIR reuse of marine metagenomic and physicochemical datasets within cyberinfrastructure systems. This includes using Frictionless Data Packages enriched with terminology from environmental and life-science ontologies to annotate measured variables, their units, and the measurement devices used. This approach was implemented in Planet Microbe, a cyberinfrastructure platform and marine metagenomic web-portal. Here, we discuss the data properties built into the specification to make global ocean datasets FAIR within the Planet Microbe portal. We additionally discuss the selection of, and contributions to marine-science ontologies used within the specification. Finally, we use the system to discover data by which to answer various biological questions about environments, physicochemical gradients, and microbial communities in meta-analyses. This work represents a future direction in marine metagenomic research by proposing a specification for FAIR dataset encapsulation that, if adopted within cyberinfrastructure systems, would automate the discovery, exchange, and re-use of data needed to answer broader reaching questions than originally intended.

6.
PeerJ ; 8: e8592, 2020.
Artículo en Inglés | MEDLINE | ID: mdl-32461821

RESUMEN

BACKGROUND: Decreasing costs make RNA sequencing technologies increasingly affordable for biologists. However, many researchers who can now afford sequencing lack access to resources necessary for downstream analysis. This means that even as algorithms to process RNA-Seq data improve, many biologists still struggle to manage the sheer volume of data produced by next generation sequencing (NGS) technologies. Scalable bioinformatics tools that exploit multiple platforms are needed to democratize bioinformatics resources in the sequencing era. This is essential for equipping many research groups in the life sciences with the tools to process the increasingly unwieldy datasets they produce. METHODS: One strategy to address this challenge is to develop a modern generation of sequence analysis tools capable of seamless data sharing and communication. Such tools will provide interoperability through offerings of interlinked resources. Systems of interlinked, scalable resources, which often incorporate cloud data storage, are broadly referred to as cyberinfrastructure. Cyberinfrastructure integrated tools will help researchers to robustly analyze large scale datasets by efficiently sharing data burdens across a distributed architecture. Additionally, interoperability will allow emerging tools to cross-adapt features of existing tools. It is important that these tools are designed to be easy to use for biologists. RESULTS: We introduce fRNAkenseq, a powered-by-CyVerse RNA sequencing analysis tool that exhibits interoperability with other resources and meets the needs of biologists for comprehensive, easy to use RNA sequencing analysis. fRNAkenseq leverages a complex set of Application Programming Interfaces (APIs) associated with the NSF-funded cyberinfrastructure project, CyVerse, to execute FASTQ-to-differential expression RNA-Seq analyses. Integrating across bioinformatics platforms, fRNAkenseq also exploits cloud integration and cross-talk with another CyVerse associated tool, CoGe. fRNAkenseq offers novel features for the biologist such as more robust and comprehensive pipelines for enrichment than those currently available by default in a single tool, whether they are cloud-based or local installation. Importantly, cross-talk with CoGe allows fRNAkenseq users to execute RNA-Seq pipelines on an inventory of 47,000 archived genomes stored in CoGe or upload their own draft genome.

7.
BMC Genomics ; 10: 400, 2009 Aug 26.
Artículo en Inglés | MEDLINE | ID: mdl-19709403

RESUMEN

BACKGROUND: New sequencing technologies are rapidly emerging. Many laboratories are simultaneously working with the traditional Sanger ESTs and experimenting with ESTs generated by the 454 Life Science sequencers. Though Sanger ESTs have been used to generate contigs for many years, no program takes full advantage of the 5' and 3' mate-pair information, hence, many tentative transcripts are assembled into two separate contigs. The new 454 technology has the benefit of high-throughput expression profiling, but introduces time and space problems for assembling large contigs. RESULTS: The PAVE (Program for Assembling and Viewing ESTs) assembler takes advantage of the 5' and 3' mate-pair information by requiring that the mate-pairs be assembled into the same contig and joined by n's if the two sub-contigs do not overlap. It handles the depth of 454 data sets by "burying" similar ESTs during assembly, which retains the expression level information while circumventing time and space problems. PAVE uses MegaBLAST for the clustering step and CAP3 for assembly, however it assembles incrementally to enforce the mate-pair constraint, bury ESTs, and reduce incorrect joins and splits. The PAVE data management system uses a MySQL database to store multiple libraries of ESTs along with their metadata; the management system allows multiple assemblies with variations on libraries and parameters. Analysis routines provide standard annotation for the contigs including a measure of differentially expressed genes across the libraries. A Java viewer program is provided for display and analysis of the results. Our results clearly show the benefit of using the PAVE assembler to explicitly use mate-pair information and bury ESTs for large contigs. CONCLUSION: The PAVE assembler provides a software package for assembling Sanger and/or 454 ESTs. The assembly software, data management software, Java viewer and user's guide are freely available.


Asunto(s)
Etiquetas de Secuencia Expresada , Alineación de Secuencia/métodos , Análisis de Secuencia de ADN/métodos , Programas Informáticos , Algoritmos , Análisis por Conglomerados , Mapeo Contig , Genoma de Planta , Zea mays/genética
8.
Gigascience ; 8(2)2019 02 01.
Artículo en Inglés | MEDLINE | ID: mdl-30597002

RESUMEN

Background: Shotgun metagenomics provides powerful insights into microbial community biodiversity and function. Yet, inferences from metagenomic studies are often limited by dataset size and complexity and are restricted by the availability and completeness of existing databases. De novo comparative metagenomics enables the comparison of metagenomes based on their total genetic content. Results: We developed a tool called Libra that performs an all-vs-all comparison of metagenomes for precise clustering based on their k-mer content. Libra uses a scalable Hadoop framework for massive metagenome comparisons, Cosine Similarity for calculating the distance using sequence composition and abundance while normalizing for sequencing depth, and a web-based implementation in iMicrobe (http://imicrobe.us) that uses the CyVerse advanced cyberinfrastructure to promote broad use of the tool by the scientific community. Conclusions: A comparison of Libra to equivalent tools using both simulated and real metagenomic datasets, ranging from 80 million to 4.2 billion reads, reveals that methods commonly implemented to reduce compute time for large datasets, such as data reduction, read count normalization, and presence/absence distance metrics, greatly diminish the resolution of large-scale comparative analyses. In contrast, Libra uses all of the reads to calculate k-mer abundance in a Hadoop architecture that can scale to any size dataset to enable global-scale analyses and link microbial signatures to biological processes.


Asunto(s)
Metagenómica/métodos , Microbiota/genética , Programas Informáticos , Algoritmos , Análisis por Conglomerados , Secuenciación de Nucleótidos de Alto Rendimiento/métodos , Análisis de Secuencia de ADN/métodos
9.
Plant Direct ; 1(2)2017 Jul.
Artículo en Inglés | MEDLINE | ID: mdl-31240274

RESUMEN

To make genomic and epigenomic analyses more widely available to the biological research community, we have created LoadExp+, a suite of bioinformatics workflows integrated with the web-based comparative genomics platform, CoGe. LoadExp+ allows users to perform transcriptomic (RNA-seq), epigenomic (bisulfite-seq), chromatin-binding (ChIP-seq), variant identification (SNPs), and population genetics analyses against any genome in CoGe, including genomes integrated by users themselves. Through LoadExp+'s integration with CoGe's existing features, all analyses are available for visualization and additional downstream processing, and are available for export to CyVerse's data management and analysis platforms. LoadExp+ provides easy-to-use functionality to manage genomics and epigenomics data throughout its entire lifecycle using a publicly available web-based platform and facilitates greater accessibility of genomics analyses to researchers of all skill levels. LoadExp+ can be accessed at https://genomevolution.org.

10.
Genome Biol Evol ; 7(12): 3286-98, 2015 Nov 11.
Artículo en Inglés | MEDLINE | ID: mdl-26560340

RESUMEN

The identification of conserved syntenic regions enables discovery of predicted locations for orthologous and homeologous genes, even when no such gene is present. This capability means that synteny-based methods are far more effective than sequence similarity-based methods in identifying true-negatives, a necessity for studying gene loss and gene transposition. However, the identification of syntenic regions requires complex analyses which must be repeated for pairwise comparisons between any two species. Therefore, as the number of published genomes increases, there is a growing demand for scalable, simple-to-use applications to perform comparative genomic analyses that cater to both gene family studies and genome-scale studies. We implemented SynFind, a web-based tool that addresses this need. Given one query genome, SynFind is capable of identifying conserved syntenic regions in any set of target genomes. SynFind is capable of reporting per-gene information, useful for researchers studying specific gene families, as well as genome-wide data sets of syntenic gene and predicted gene locations, critical for researchers focused on large-scale genomic analyses. Inference of syntenic homologs provides the basis for correlation of functional changes around genes of interests between related organisms. Deployed on the CoGe online platform, SynFind is connected to the genomic data from over 15,000 organisms from all domains of life as well as supporting multiple releases of the same organism. SynFind makes use of a powerful job execution framework that promises scalability and reproducibility. SynFind can be accessed at http://genomevolution.org/CoGe/SynFind.pl. A video tutorial of SynFind using Phytophthrora as an example is available at http://www.youtube.com/watch?v=2Agczny9Nyc.


Asunto(s)
Genoma , Análisis de Secuencia de ADN/métodos , Programas Informáticos , Sintenía
11.
Genetics ; 198(1): 283-97, 2014 Sep.
Artículo en Inglés | MEDLINE | ID: mdl-24996909

RESUMEN

One approach to understanding the genetic basis of speciation is to scan the genomes of recently diverged taxa to identify highly differentiated regions. The house mouse, Mus musculus, provides a useful system for the study of speciation. Three subspecies (M. m. castaneus, M. m. domesticus, and M. m. musculus) diverged ∼350 KYA, are distributed parapatrically, show varying degrees of reproductive isolation in laboratory crosses, and hybridize in nature. We sequenced the testes transcriptomes of multiple wild-derived inbred lines from each subspecies to identify highly differentiated regions of the genome, to identify genes showing high expression divergence, and to compare patterns of differentiation among subspecies that have different demographic histories and exhibit different levels of reproductive isolation. Using a sliding-window approach, we found many genomic regions with high levels of sequence differentiation in each of the pairwise comparisons among subspecies. In all comparisons, the X chromosome was more highly differentiated than the autosomes. Sequence differentiation and expression divergence were greater in the M. m. domesticus-M. m. musculus comparison than in either pairwise comparison with M. m. castaneus, which is consistent with laboratory crosses that show the greatest reproductive isolation between M. m. domesticus and M. m. musculus. Coalescent simulations suggest that differences in estimates of effective population size can account for many of the observed patterns. However, there was an excess of highly differentiated regions relative to simulated distributions under a wide range of demographic scenarios. Overlap of some highly differentiated regions with previous results from QTL mapping and hybrid zone studies points to promising candidate regions for reproductive isolation.


Asunto(s)
Especiación Genética , Genoma , Ratones/genética , Animales , Modelos Genéticos , Polimorfismo Genético , Sitios de Carácter Cuantitativo , Aislamiento Reproductivo , Cromosoma X/genética
SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA