Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 13 de 13
Filtrar
Mais filtros








Base de dados
Intervalo de ano de publicação
1.
PLoS Comput Biol ; 20(2): e1011270, 2024 Feb.
Artigo em Inglês | MEDLINE | ID: mdl-38324613

RESUMO

CyVerse, the largest publicly-funded open-source research cyberinfrastructure for life sciences, has played a crucial role in advancing data-driven research since the 2010s. As the technology landscape evolved with the emergence of cloud computing platforms, machine learning and artificial intelligence (AI) applications, CyVerse has enabled access by providing interfaces, Software as a Service (SaaS), and cloud-native Infrastructure as Code (IaC) to leverage new technologies. CyVerse services enable researchers to integrate institutional and private computational resources, custom software, perform analyses, and publish data in accordance with open science principles. Over the past 13 years, CyVerse has registered more than 124,000 verified accounts from 160 countries and was used for over 1,600 peer-reviewed publications. Since 2011, 45,000 students and researchers have been trained to use CyVerse. The platform has been replicated and deployed in three countries outside the US, with additional private deployments on commercial clouds for US government agencies and multinational corporations. In this manuscript, we present a strategic blueprint for creating and managing SaaS cyberinfrastructure and IaC as free and open-source software.


Assuntos
Inteligência Artificial , Software , Humanos , Computação em Nuvem , Editoração
2.
bioRxiv ; 2023 Jun 16.
Artigo em Inglês | MEDLINE | ID: mdl-37398279

RESUMO

Summary: dadi is a popular software package for inferring models of demographic history and natural selection from population genomic data. But using dadi requires Python scripting and manual parallelization of optimization jobs. We developed dadi-cli to simplify dadi usage and also enable straighforward distributed computing. Availability and Implementation: dadi-cli is implemented in Python and released under the Apache License 2.0. The source code is available at https://github.com/xin-huang/dadi-cli . dadi-cli can be installed via PyPI and conda, and is also available through Cacao on Jetstream2 https://cacao.jetstream-cloud.org/ .

3.
Front Plant Sci ; 14: 1112973, 2023.
Artigo em Inglês | MEDLINE | ID: mdl-36950362

RESUMO

As phenomics data volume and dimensionality increase due to advancements in sensor technology, there is an urgent need to develop and implement scalable data processing pipelines. Current phenomics data processing pipelines lack modularity, extensibility, and processing distribution across sensor modalities and phenotyping platforms. To address these challenges, we developed PhytoOracle (PO), a suite of modular, scalable pipelines for processing large volumes of field phenomics RGB, thermal, PSII chlorophyll fluorescence 2D images, and 3D point clouds. PhytoOracle aims to (i) improve data processing efficiency; (ii) provide an extensible, reproducible computing framework; and (iii) enable data fusion of multi-modal phenomics data. PhytoOracle integrates open-source distributed computing frameworks for parallel processing on high-performance computing, cloud, and local computing environments. Each pipeline component is available as a standalone container, providing transferability, extensibility, and reproducibility. The PO pipeline extracts and associates individual plant traits across sensor modalities and collection time points, representing a unique multi-system approach to addressing the genotype-phenotype gap. To date, PO supports lettuce and sorghum phenotypic trait extraction, with a goal of widening the range of supported species in the future. At the maximum number of cores tested in this study (1,024 cores), PO processing times were: 235 minutes for 9,270 RGB images (140.7 GB), 235 minutes for 9,270 thermal images (5.4 GB), and 13 minutes for 39,678 PSII images (86.2 GB). These processing times represent end-to-end processing, from raw data to fully processed numerical phenotypic trait data. Repeatability values of 0.39-0.95 (bounding area), 0.81-0.95 (axis-aligned bounding volume), 0.79-0.94 (oriented bounding volume), 0.83-0.95 (plant height), and 0.81-0.95 (number of points) were observed in Field Scanalyzer data. We also show the ability of PO to process drone data with a repeatability of 0.55-0.95 (bounding area).

5.
Database (Oxford) ; 20192019 01 01.
Artigo em Inglês | MEDLINE | ID: mdl-31210271

RESUMO

High-throughput sequencing and proteomics technologies are markedly increasing the amount of RNA and peptide data that are available to researchers, which are typically made publicly available via data repositories such as the NCBI Sequence Read Archive and proteome archives, respectively. These data sets contain valuable information about when and where gene products are expressed, but this information is not readily obtainable from archived data sets. Here we report Chickspress (http://geneatlas.arl.arizona.edu), the first publicly available gene expression resource for chicken tissues. Since there is no single source of chicken gene models, Chickspress incorporates both NCBI and Ensembl gene models and links these gene sets with experimental gene expression data and QTL information. By linking gene models from both NCBI and Ensembl gene prediction pipelines, researchers can, for the first time, easily compare gene models from each of these prediction workflows to available experimental data for these products. We use Chickspress data to show the differences between these gene annotation pipelines. Chickspress also provides rapid search, visualization and download capacity for chicken gene sets based upon tissue type, developmental stage and experiment type. This first Chickspress release contains 161 gene expression data sets, including expression of mRNAs, miRNAs, proteins and peptides. We provide several examples demonstrating how researchers may use this resource.


Assuntos
Galinhas , Bases de Dados Genéticas , Regulação da Expressão Gênica , Transcriptoma , Animais , Proteínas Aviárias/biossíntese , Proteínas Aviárias/genética , Galinhas/genética , Galinhas/metabolismo , MicroRNAs/biossíntese , MicroRNAs/genética , Modelos Genéticos , Característica Quantitativa Herdável , RNA Mensageiro/biossíntese , RNA Mensageiro/genética
6.
Bioinformatics ; 34(15): 2651-2653, 2018 08 01.
Artigo em Inglês | MEDLINE | ID: mdl-29474529

RESUMO

Summary: The EPIC-CoGe browser is a web-based genome visualization utility that integrates the GMOD JBrowse genome browser with the extensive CoGe genome database (currently containing over 30 000 genomes). In addition, the EPIC-CoGe browser boasts many additional features over basic JBrowse, including enhanced search capability and on-the-fly analyses for comparisons and analyses between all types of functional and diversity genomics data. There is no installation required and data (genome, annotation, functional genomic and diversity data) can be loaded by following a simple point and click wizard, or using a REST API, making the browser widely accessible and easy to use by researchers of all computational skill levels. In addition, EPIC-CoGe and data tracks are easily embedded in other websites and JBrowse instances. Availability and implementation: EPIC-CoGe Browser is freely available for use online through CoGe (https://genomevolution.org). Source code (MIT open source) is available: https://github.com/LyonsLab/coge. Supplementary information: Supplementary data are available at Bioinformatics online.


Assuntos
Visualização de Dados , Genoma , Anotação de Sequência Molecular , Análise de Sequência de DNA/métodos , Software , Genômica/métodos
7.
Bioinformatics ; 33(14): 2197-2198, 2017 Jul 15.
Artigo em Inglês | MEDLINE | ID: mdl-28334338

RESUMO

SUMMARY: Current synteny visualization tools either focus on small regions of sequence and do not illustrate genome-wide trends, or are complicated to use and create visualizations that are difficult to interpret. To address this challenge, The Comparative Genomics Platform (CoGe) has developed two web-based tools to visualize synteny across whole genomes. SynMap2 and SynMap3D allow researchers to explore whole genome synteny patterns (across two or three genomes, respectively) in responsive, web-based visualization and virtual reality environments. Both tools have access to the extensive CoGe genome database (containing over 30 000 genomes) as well as the option for users to upload their own data. By leveraging modern web technologies there is no installation required, making the tools widely accessible and easy to use. AVAILABILITY AND IMPLEMENTATION: Both tools are open source (MIT license) and freely available for use online through CoGe ( https://genomevolution.org ). SynMap2 and SynMap3D can be accessed at http://genomevolution.org/coge/SynMap.pl and http://genomevolution.org/coge/SynMap3D.pl , respectively. Source code is available: https://github.com/LyonsLab/coge . CONTACT: ericlyons@email.arizona.edu. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Assuntos
Genômica/métodos , Software , Sintenia , Navegador , Sequenciamento Completo do Genoma , Genoma
8.
Plant Direct ; 1(2)2017 Jul.
Artigo em Inglês | MEDLINE | ID: mdl-31240274

RESUMO

To make genomic and epigenomic analyses more widely available to the biological research community, we have created LoadExp+, a suite of bioinformatics workflows integrated with the web-based comparative genomics platform, CoGe. LoadExp+ allows users to perform transcriptomic (RNA-seq), epigenomic (bisulfite-seq), chromatin-binding (ChIP-seq), variant identification (SNPs), and population genetics analyses against any genome in CoGe, including genomes integrated by users themselves. Through LoadExp+'s integration with CoGe's existing features, all analyses are available for visualization and additional downstream processing, and are available for export to CyVerse's data management and analysis platforms. LoadExp+ provides easy-to-use functionality to manage genomics and epigenomics data throughout its entire lifecycle using a publicly available web-based platform and facilitates greater accessibility of genomics analyses to researchers of all skill levels. LoadExp+ can be accessed at https://genomevolution.org.

9.
Bioinformatics ; 33(4): 552-554, 2017 02 15.
Artigo em Inglês | MEDLINE | ID: mdl-27794557

RESUMO

Summary: Following polyploidy events, genomes undergo massive reduction in gene content through a process known as fractionation. Importantly, the fractionation process is not always random, and a bias as to which homeologous chromosome retains or loses more genes can be observed in some species. The process of characterizing whole genome fractionation requires identifying syntenic regions across genomes followed by post-processing of those syntenic datasets to identify and plot gene retention patterns. We have developed a tool, FractBias, to calculate and visualize gene retention and fractionation patterns across whole genomes. Through integration with SynMap and its parent platform CoGe, assembled genomes are pre-loaded and available for analysis, as well as letting researchers integrate their own data with security options to keep them private or make them publicly available. Availability and Implementation: FractBias is freely available as a web application at https://genomevolution.org/CoGe/SynMap.pl . The software is open source (MIT license) and executable with Python 2.7 or iPython notebook, and available on GitHub ( https://goo.gl/PaAtqy ). Documentation for FractBias is available on CoGepedia ( https://goo.gl/ou9dt6 ). Contact: ericlyons@email.arizona.edu. Supplementary information: Supplementary data are available at Bioinformatics online.


Assuntos
Evolução Molecular , Genoma de Planta , Genômica/métodos , Poliploidia , Software , Genes de Plantas , Plantas/genética , Análise de Sequência de DNA/métodos
10.
Nucleic Acids Res ; 42(Database issue): D933-7, 2014 Jan.
Artigo em Inglês | MEDLINE | ID: mdl-24150938

RESUMO

GEISHA (Gallus Expression In Situ Hybridization Analysis; http://geisha.arizona.edu) is an in situ hybridization gene expression and genomic resource for the chicken embryo. This update describes modifications that enhance its utility to users. During the past 5 years, GEISHA has undertaken a significant restructuring to more closely conform to the data organization and formatting of Model Organism Databases in other species. This has involved migrating from an entry-centric format to one that is gene-centered. Database restructuring has enabled the inclusion of data pertaining to chicken genes and proteins and their orthologs in other species. This new information is presented through an updated user interface. In situ hybridization data in mouse, frog, zebrafish and fruitfly are integrated with chicken genomic and expression information. A resource has also been developed that integrates the GEISHA interface information with the Online Mendelian Inheritance in Man human disease gene database. Finally, the Chicken Gene Nomenclature Committee database and the GEISHA database have been integrated so that they draw from the same data resources.


Assuntos
Embrião de Galinha/metabolismo , Galinhas/genética , Bases de Dados Genéticas , Expressão Gênica , Animais , Genômica , Hibridização In Situ , Internet , Camundongos , Modelos Animais , RNA Mensageiro/análise
11.
BMC Genomics ; 10 Suppl 2: S6, 2009 Jul 14.
Artigo em Inglês | MEDLINE | ID: mdl-19607657

RESUMO

BACKGROUND: Systems Biology research tools, such as Cytoscape, have greatly extended the reach of genomic research. By providing platforms to integrate data with molecular interaction networks, researchers can more rapidly begin interpretation of large data sets collected for a system of interest. BioNetBuilder is an open-source client-server Cytoscape plugin that automatically integrates molecular interactions from all major public interaction databases and serves them directly to the user's Cytoscape environment. Until recently however, chicken and other eukaryotic model systems had little interaction data available. RESULTS: Version 2.0 of BioNetBuilder includes a redesigned synonyms resolution engine that enables transfer and integration of interactions across species; this engine translates between alternate gene names as well as between orthologs in multiple species. Additionally, BioNetBuilder is now implemented to be part of the Gaggle, thereby allowing seamless communication of interaction data to any software implementing the widely used Gaggle software. Using BioNetBuilder, we constructed a chicken interactome possessing 72,000 interactions among 8,140 genes directly in the Cytoscape environment. In this paper, we present a tutorial on how to do so and analysis of a specific use case. CONCLUSION: BioNetBuilder 2.0 provides numerous user-friendly systems biology tools that were otherwise inaccessible to researchers in chicken genomics, as well as other model systems. We provide a detailed tutorial spanning all required steps in the analysis. BioNetBuilder 2.0, the tools for maintaining its data bases, standard operating procedures for creating local copies of its back-end data bases, as well as all of the Gaggle and Cytoscape codes required, are open-source and freely available at http://err.bio.nyu.edu/cytoscape/bionetbuilder/.


Assuntos
Biologia Computacional/métodos , Sistemas de Gerenciamento de Base de Dados , Biologia de Sistemas , Animais , Galinhas/genética , Genômica/métodos , Análise de Sequência com Séries de Oligonucleotídeos
12.
J Proteome Res ; 4(2): 358-68, 2005.
Artigo em Inglês | MEDLINE | ID: mdl-15822911

RESUMO

The discovery of unanticipated protein modifications is one of the most challenging problems in proteomics. Whereas widely used algorithms such as Sequest and Mascot enable mapping of modifications when the mass and amino acid specificity are known, unexpected modifications cannot be identified with these tools. We have developed an algorithm and software called P-Mod, which enables discovery and sequence mapping of modifications to target proteins known to be represented in the analysis or identified by Sequest. P-Mod matches MS/MS spectra to peptide sequences in a search list. For spectra of modified peptides, P-Mod calculates mass differences between search peptide sequences and MS/MS precursors and localizes the mass shift to a sequence position in the peptide. Because modifications are detected as mass shifts, P-Mod does not require the user to guess at masses or sequence locations of modifications. P-Mod uses extreme value statistics to assign p value estimates to sequence-to-spectrum matches. The reported p values are scaled to account for the number of comparisons, so that error rates do not increase with the expanded search lists that result from incorporating potential peptide modifications. Combination of P-Mod searches from multiple LC-MS/MS analyses and multiple samples revealed previously unreported BSA modifications, including a novel decarboxymethylation or D-->G substitution at position 579 of the protein. P-Mod can serve a unique role in the identification of protein modifications both from exogenous and endogenous sources and may be useful for identifying modified protein forms as biomarkers for toxicity and disease processes.


Assuntos
Algoritmos , Peptídeos/química , Software , Sequência de Aminoácidos , Espectrometria de Massas , Dados de Sequência Molecular , Sensibilidade e Especificidade , Interface Usuário-Computador
13.
Anal Chem ; 74(1): 203-10, 2002 Jan 01.
Artigo em Inglês | MEDLINE | ID: mdl-11795795

RESUMO

We have developed a pattern recognition algorithm called SALSA (scoring algorithm for spectral analysis) for the detection of specific features in tandem MS (MS-MS) spectra. Application of the SALSA algorithm to the detection of peptide MS-MS ion series enables identification of MS-MS spectra displaying characteristics of specific peptide sequences. SALSA analysis scores MS-MS spectra based on correspondence between theoretical ion series for peptide sequence motifs and actual MS-MS product ion series, regardless of their absolute positions on the m/z axis. Analyses of tryptic digests of bovine serum albumin (BSA) by LC-MS-MS followed by SALSA analysis detected MS-MS spectra for both unmodified and multiple modified forms of several BSA tryptic peptides. SALSA analysis of MS-MS data from mixtures of BSA and human serum albumin (HSA) tryptic digests indicated that ion series searches with BSA peptide sequence motifs identified MS-MS spectra for both BSA and closely related HSA peptides. Optimal discrimination between MS-MS spectra of variant peptide forms is achieved when the SALSA search criteria are optimized to the target peptide. Application of SALSA to LC-MS-MS proteome analysis will facilitate the characterization of modified and sequence variant proteins.


Assuntos
Peptídeos/química , Análise de Sequência de Proteína/métodos , Algoritmos , Motivos de Aminoácidos , Animais , Humanos , Espectrometria de Massas/métodos , Análise de Sequência de Proteína/instrumentação
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA