Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 33
Filtrar
Más filtros

Banco de datos
País/Región como asunto
Tipo del documento
Intervalo de año de publicación
1.
BMC Bioinformatics ; 20(1): 405, 2019 Jul 25.
Artículo en Inglés | MEDLINE | ID: mdl-31345161

RESUMEN

BACKGROUND: Next-generation sequencing technologies can produce tens of millions of reads, often paired-end, from transcripts or genomes. But few programs can align RNA on the genome and accurately discover introns, especially with long reads. We introduce Magic-BLAST, a new aligner based on ideas from the Magic pipeline. RESULTS: Magic-BLAST uses innovative techniques that include the optimization of a spliced alignment score and selective masking during seed selection. We evaluate the performance of Magic-BLAST to accurately map short or long sequences and its ability to discover introns on real RNA-seq data sets from PacBio, Roche and Illumina runs, and on six benchmarks, and compare it to other popular aligners. Additionally, we look at alignments of human idealized RefSeq mRNA sequences perfectly matching the genome. CONCLUSIONS: We show that Magic-BLAST is the best at intron discovery over a wide range of conditions and the best at mapping reads longer than 250 bases, from any platform. It is versatile and robust to high levels of mismatches or extreme base composition, and reasonably fast. It can align reads to a BLAST database or a FASTA file. It can accept a FASTQ file as input or automatically retrieve an accession from the SRA repository at the NCBI.


Asunto(s)
ARN/genética , Alineación de Secuencia , Análisis de Secuencia de ARN/métodos , Programas Informáticos , Algoritmos , Secuencia de Bases , Bases de Datos de Ácidos Nucleicos , Humanos , Intrones/genética , Curva ROC , Factores de Tiempo
2.
BMC Genomics ; 20(1): 591, 2019 Jul 18.
Artículo en Inglés | MEDLINE | ID: mdl-31319791

RESUMEN

BACKGROUND: During the last decade, plant biotechnological laboratories have sparked a monumental revolution with the rapid development of next sequencing technologies at affordable prices. Soon, these sequencing technologies and assembling of whole genomes will extend beyond the plant computational biologists and become commonplace within the plant biology disciplines. The current availability of large-scale genomic resources for non-traditional plant model systems (the so-called 'orphan crops') is enabling the construction of high-density integrated physical and genetic linkage maps with potential applications in plant breeding. The newly available fully sequenced plant genomes represent an incredible opportunity for comparative analyses that may reveal new aspects of genome biology and evolution. The analysis of the expansion and evolution of gene families across species is a common approach to infer biological functions. To date, the extent and role of gene families in plants has only been partially addressed and many gene families remain to be investigated. Manual identification of gene families is highly time-consuming and laborious, requiring an iterative process of manual and computational analysis to identify members of a given family, typically combining numerous BLAST searches and manually cleaning data. Due to the increasing abundance of genome sequences and the agronomical interest in plant gene families, the field needs a clear, automated annotation tool. RESULTS: Here, we present the geneHummus package, an R-based pipeline for the identification and characterization of plant gene families. The impact of this pipeline comes from a reduction in hands-on annotation time combined with high specificity and sensitivity in extracting only proteins from the RefSeq database and providing the conserved domain architectures based on SPARCLE. As a case study we focused on the auxin receptor factors gene (ARF) family in Cicer arietinum (chickpea) and other legumes. CONCLUSION: We anticipate that our pipeline should be suitable for any taxonomic plant family, and likely other gene families, vastly improving the speed and ease of genomic data processing.


Asunto(s)
Fabaceae/genética , Genes de Plantas , Familia de Multigenes , Programas Informáticos , Cicer/genética , Filogenia , Proteínas de Plantas/genética , Receptores de Superficie Celular/genética , Transcriptoma
3.
BMC Cancer ; 16: 186, 2016 Mar 05.
Artículo en Inglés | MEDLINE | ID: mdl-26944546

RESUMEN

BACKGROUND: Intrinsic and acquired resistance to drug therapies remains a challenge for malignant melanoma patients. Intratumoral heterogeneities within the tumor microenvironment contribute additional complexity to the determinants of drug efficacy and acquired resistance. METHODS: We use 3D biomimetic platforms to understand dynamics in extracellular matrix (ECM) biogenesis following pharmaceutical intervention against mitogen-activated protein kinases (MAPK) signaling. We further determined temporal evolution of secreted ECM components by isogenic melanoma cell clones. RESULTS: We found that the cell clones differentially secrete and assemble a myriad of ECM molecules into dense fibrillar and globular networks. We show that cells can modulate their ECM biosynthesis in response to external insults. Fibronectin (FN) is one of the key architectural components, modulating the efficacy of a broad spectrum of drug therapies. Stable cell lines engineered to secrete minimal levels of FN showed a concomitant increase in secretion of Tenascin-C and became sensitive to BRAF(V600E) and ERK inhibition as clonally- derived 3D tumor aggregates. These cells failed to assemble exogenous FN despite maintaining the integrin machinery to facilitate cell- ECM cross-talk. We determined that only clones that increased FN production via p38 MAPK and ß1 integrin survived drug treatment. CONCLUSIONS: These data suggest that tumor cells engineer drug resistance by altering their ECM biosynthesis. Therefore, drug treatment may induce ECM biosynthesis, contributing to de novo resistance.


Asunto(s)
Matriz Extracelular/metabolismo , Melanoma/metabolismo , Proteínas Quinasas Activadas por Mitógenos/metabolismo , Transducción de Señal , Animales , Antineoplásicos/farmacología , Antineoplásicos/uso terapéutico , Línea Celular Tumoral , Movimiento Celular , Supervivencia Celular , Modelos Animales de Enfermedad , Resistencia a Antineoplásicos , Proteínas de la Matriz Extracelular/metabolismo , Femenino , Fibronectinas/metabolismo , Xenoinjertos , Humanos , Melanoma/tratamiento farmacológico , Melanoma/patología , Proteínas Quinasas Activadas por Mitógenos/antagonistas & inhibidores , Metástasis de la Neoplasia , Inhibidores de Proteínas Quinasas/farmacología , Inhibidores de Proteínas Quinasas/uso terapéutico , Tenascina/metabolismo , Microambiente Tumoral
5.
Clin Cancer Res ; 2024 Mar 06.
Artículo en Inglés | MEDLINE | ID: mdl-38446993

RESUMEN

PURPOSE: Clonal hematopoiesis (CH) is thought to be the origin of myeloid neoplasms (MN). Yet our understanding of the mechanisms driving CH progression to MN and clinical risk prediction of MN remains limited. The human proteome reflects complex interactions between genetic and epigenetic regulation of biological systems. We hypothesized that the plasma proteome might predict MN risk and inform our understanding of the mechanisms promoting MN development. EXPERIMENTAL DESIGN: We jointly characterized CH and plasma proteomic profiles of 46,237 individuals in the UK Biobank at baseline study entry. During 500,036 person-years of follow-up, 115 individuals developed MN. Cox proportional hazard regression was used to test for an association between plasma protein levels and MN risk. RESULTS: We identified 115 proteins associated with MN risk of which 30% (N=34) were also associated with CH. These were enriched for known regulators of the innate and adaptive immune system. Plasma proteomics improved the prediction of MN risk (AUC=0.85, p=5×10-9) beyond clinical factors and CH (AUC=0.80). In an independent group (N=381,485), we used inherited polygenic risk scores (PRS) for plasma protein levels to validate the relevance of these proteins to MN development. PRS analyses suggest that most MN-associated proteins we identified are not directly causally linked to MN risk, but rather represent downstream markers of pathways regulating the progression of CH to MN. CONCLUSIONS: These data highlight the role of immune cell regulation in the progression of CH to MN and the promise of leveraging multi-omic characterization of CH to improve MN risk stratification.

6.
Environ Microbiol ; 15(2): 307-12, 2013 Feb.
Artículo en Inglés | MEDLINE | ID: mdl-23035931

RESUMEN

Facultative pathogens have extremely dynamic pan-genomes, to a large extent derived from bacteriophages and other mobile elements. We developed a simple approach to identify phage-derived genomic islands and apply it to show that pathogens from diverse bacterial genera are significantly enriched in clustered phage-derived genes compared with related benign strains. These findings show that genome expansion by integration of prophages containing virulence factors is a major route of evolution of facultative bacterial pathogens.


Asunto(s)
Bacterias/genética , Bacterias/patogenicidad , Genoma Bacteriano/genética , Islas Genómicas/genética , Profagos/genética , Virulencia/genética , Evolución Biológica , Factores de Virulencia/genética
7.
Mol Membr Biol ; 29(2): 36-51, 2012 Mar.
Artículo en Inglés | MEDLINE | ID: mdl-22416964

RESUMEN

Abstract Small ankyrin-1 is a splice variant of the ANK1 gene that binds to obscurin A. Previous studies have identified electrostatic interactions that contribute to this interaction. In addition, molecular dynamics (MD) simulations predict four hydrophobic residues in a 'hot spot' on the surface of the ankyrin-like repeats of sAnk1, near the charged residues involved in binding. We used site-directed mutagenesis, blot overlays and surface plasmon resonance assays to study the contribution of the hydrophobic residues, V70, F71, I102 and I103, to two different 30-mers of obscurin that bind sAnk1, Obsc6316₋6345 and Obsc6231₋6260. Alanine mutations of each of the hydrophobic residues disrupted binding to the high affinity binding site, Obsc6316₋6345. In contrast, V70A and I102A mutations had no effect on binding to the lower affinity site, Obsc6231₋6260. Alanine mutagenesis of the five hydrophobic residues present in Obsc6316₋6345 showed that V6328, I6332, and V6334 were critical to sAnk1 binding. Individual alanine mutants of the six hydrophobic residues of Obsc6231₋6260 had no effect on binding to sAnk1, although a triple alanine mutant of residues V6233/I6234/I6235 decreased binding. We also examined a model of the Obsc6316₋6345-sAnk1 complex in MD simulations and found I102 of sAnk1 to be within 2.2Šof V6334 of Obsc6316₋6345. In contrast to the I102A mutation, mutating I102 of sAnk1 to other hydrophobic amino acids such as phenylalanine or leucine did not disrupt binding to obscurin. Our results suggest that hydrophobic interactions contribute to the higher affinity of Obsc6316₋6345 for sAnk1 and to the dominant role exhibited by this sequence in binding.


Asunto(s)
Ancirinas/química , Ancirinas/metabolismo , Factores de Intercambio de Guanina Nucleótido/química , Factores de Intercambio de Guanina Nucleótido/metabolismo , Proteínas Musculares/química , Proteínas Musculares/metabolismo , Alanina/genética , Alanina/metabolismo , Secuencia de Aminoácidos , Animales , Ancirinas/genética , Sitios de Unión , Interacciones Hidrofóbicas e Hidrofílicas , Leucina/genética , Leucina/metabolismo , Simulación de Dinámica Molecular , Datos de Secuencia Molecular , Mutagénesis Sitio-Dirigida , Unión Proteica , Isoformas de Proteínas/química , Isoformas de Proteínas/genética , Isoformas de Proteínas/metabolismo , Ratas , Proteínas Recombinantes/química , Proteínas Recombinantes/genética , Proteínas Recombinantes/metabolismo , Resonancia por Plasmón de Superficie
8.
Front Bioinform ; 3: 1277923, 2023.
Artículo en Inglés | MEDLINE | ID: mdl-37885757

RESUMEN

Motivation: For a number of neurological diseases, such as Alzheimer's disease, amyotrophic lateral sclerosis, and many others, certain genes are known to be involved in the disease mechanism. A common question is whether a structural variant in any such gene may be related to drug response in clinical trials and how this relationship can contribute to the lifecycle of drug development. Results: To this end, we introduce VariantSurvival, a tool that identifies changes in survival relative to structural variants within target genes. VariantSurvival matches annotated structural variants with genes that are clinically relevant to neurological diseases. A Cox regression model determines the change in survival between the placebo and clinical trial groups with respect to the number of structural variants in the drug target genes. We demonstrate the functionality of our approach with the exemplary case of the SETX gene. VariantSurvival has a user-friendly and lightweight graphical user interface built on the shiny web application package.

9.
PLoS One ; 18(11): e0293879, 2023.
Artículo en Inglés | MEDLINE | ID: mdl-37943810

RESUMEN

Science, technology, engineering, mathematics, and medicine (STEMM) fields change rapidly and are increasingly interdisciplinary. Commonly, STEMM practitioners use short-format training (SFT) such as workshops and short courses for upskilling and reskilling, but unaddressed challenges limit SFT's effectiveness and inclusiveness. Education researchers, students in SFT courses, and organizations have called for research and strategies that can strengthen SFT in terms of effectiveness, inclusiveness, and accessibility across multiple dimensions. This paper describes the project that resulted in a consensus set of 14 actionable recommendations to systematically strengthen SFT. A diverse international group of 30 experts in education, accessibility, and life sciences came together from 10 countries to develop recommendations that can help strengthen SFT globally. Participants, including representation from some of the largest life science training programs globally, assembled findings in the educational sciences and encompassed the experiences of several of the largest life science SFT programs. The 14 recommendations were derived through a Delphi method, where consensus was achieved in real time as the group completed a series of meetings and tasks designed to elicit specific recommendations. Recommendations cover the breadth of SFT contexts and stakeholder groups and include actions for instructors (e.g., make equity and inclusion an ethical obligation), programs (e.g., centralize infrastructure for assessment and evaluation), as well as organizations and funders (e.g., professionalize training SFT instructors; deploy SFT to counter inequity). Recommendations are aligned with a purpose-built framework-"The Bicycle Principles"-that prioritizes evidenced-based teaching, inclusiveness, and equity, as well as the ability to scale, share, and sustain SFT. We also describe how the Bicycle Principles and recommendations are consistent with educational change theories and can overcome systemic barriers to delivering consistently effective, inclusive, and career-spanning SFT.


Asunto(s)
Estudiantes , Tecnología , Humanos , Consenso , Ingeniería
10.
Blood Adv ; 7(9): 1796-1810, 2023 05 09.
Artículo en Inglés | MEDLINE | ID: mdl-36170795

RESUMEN

Serum tryptase is a biomarker used to aid in the identification of certain myeloid neoplasms, most notably systemic mastocytosis, where basal serum tryptase (BST) levels >20 ng/mL are a minor criterion for diagnosis. Although clonal myeloid neoplasms are rare, the common cause for elevated BST levels is the genetic trait hereditary α-tryptasemia (HαT) caused by increased germline TPSAB1 copy number. To date, the precise structural variation and mechanism(s) underlying elevated BST in HαT and the general clinical utility of tryptase genotyping, remain undefined. Through cloning, long-read sequencing, and assembling of the human tryptase locus from an individual with HαT, and validating our findings in vitro and in silico, we demonstrate that BST elevations arise from overexpression of replicated TPSAB1 loci encoding canonical α-tryptase protein owing to coinheritance of a linked overactive promoter element. Modeling BST levels based on TPSAB1 replication number, we generate new individualized clinical reference values for the upper limit of normal. Using this personalized laboratory medicine approach, we demonstrate the clinical utility of tryptase genotyping, finding that in the absence of HαT, BST levels >11.4 ng/mL frequently identify indolent clonal mast cell disease. Moreover, substantial BST elevations (eg, >100 ng/mL), which would ordinarily prompt bone marrow biopsy, can result from TPSAB1 replications alone and thus be within normal limits for certain individuals with HαT.


Asunto(s)
Mastocitosis , Trastornos Mieloproliferativos , Humanos , Triptasas/genética , Mastocitos , Valores de Referencia , Procedimientos Innecesarios , Mastocitosis/diagnóstico , Trastornos Mieloproliferativos/patología
11.
Clin Transl Gastroenterol ; 13(1): e00455, 2022 01 19.
Artículo en Inglés | MEDLINE | ID: mdl-35060944

RESUMEN

INTRODUCTION: Pancreatitis is a complex syndrome that results from many etiologies. Large well-characterized cohorts are needed to further understand disease risk and prognosis. METHODS: A pancreatitis cohort of more than 4,200 patients and 24,000 controls were identified in the UK BioBank (UKBB) consortium. A descriptive analysis was completed, comparing patients with acute (AP) and chronic pancreatitis (CP). The Toxic-metabolic, Idiopathic, Genetic, Autoimmune, Recurrent, and severe pancreatitis and Obstructive checklist Version 2 classification was applied to patients with AP and CP and compared with the control population. RESULTS: CP prevalence in the UKBB is 163 per 100,000. AP incidence increased from 21.4/100,000 per year from 2001 to 2005 to 48.2/100,000 per year between 2016 and 2020. Gallstones and smoking were confirmed as key risk factors for AP and CP, respectively. Both populations carry multiple risk factors and a high burden of comorbidities, including benign and malignant neoplastic disorders. DISCUSSION: The UKBB serves as a rich cohort to evaluate pancreatitis. Disease burden of AP and CP was high in this population. The association of common risk factors identified in other cohort studies was confirmed in this study. Further analysis is needed to link genomic risks and biomarkers with disease features in this population.


Asunto(s)
Bancos de Muestras Biológicas , Pancreatitis Crónica , Estudios de Cohortes , Humanos , Pancreatitis Crónica/complicaciones , Pancreatitis Crónica/epidemiología , Prevalencia , Reino Unido/epidemiología
12.
Front Mol Biosci ; 9: 831740, 2022.
Artículo en Inglés | MEDLINE | ID: mdl-35252351

RESUMEN

iCn3D was initially developed as a web-based 3D molecular viewer. It then evolved from visualization into a full-featured interactive structural analysis software. It became a collaborative research instrument through the sharing of permanent, shortened URLs that encapsulate not only annotated visual molecular scenes, but also all underlying data and analysis scripts in a FAIR manner. More recently, with the growth of structural databases, the need to analyze large structural datasets systematically led us to use Python scripts and convert the code to be used in Node. js scripts. We showed a few examples of Python scripts at https://github.com/ncbi/icn3d/tree/master/icn3dpython to export secondary structures or PNG images from iCn3D. Users just need to replace the URL in the Python scripts to export other annotations from iCn3D. Furthermore, any interactive iCn3D feature can be converted into a Node. js script to be run in batch mode, enabling an interactive analysis performed on one or a handful of protein complexes to be scaled up to analysis features of large ensembles of structures. Currently available Node. js analysis scripts examples are available at https://github.com/ncbi/icn3d/tree/master/icn3dnode. This development will enable ensemble analyses on growing structural databases such as AlphaFold or RoseTTAFold on one hand and Electron Microscopy on the other. In this paper, we also review new features such as DelPhi electrostatic potential, 3D view of mutations, alignment of multiple chains, assembly of multiple structures by realignment, dynamic symmetry calculation, 2D cartoons at different levels, interactive contact maps, and use of iCn3D in Jupyter Notebook as described at https://pypi.org/project/icn3dpy.

13.
PLoS One ; 16(1): e0244876, 2021.
Artículo en Inglés | MEDLINE | ID: mdl-33411719

RESUMEN

Characterizing the gut microbiota in terms of their capacity to interfere with drug metabolism is necessary to achieve drug efficacy and safety. Although examples of drug-microbiome interactions are well-documented, little has been reported about a computational pipeline for systematically identifying and characterizing bacterial enzymes that process particular classes of drugs. The goal of our study is to develop a computational approach that compiles drugs whose metabolism may be influenced by a particular class of microbial enzymes and that quantifies the variability in the collective level of those enzymes among individuals. The present paper describes this approach, with microbial ß-glucuronidases as an example, which break down drug-glucuronide conjugates and reactivate the drugs or their metabolites. We identified 100 medications that may be metabolized by ß-glucuronidases from the gut microbiome. These medications included morphine, estrogen, ibuprofen, midazolam, and their structural analogues. The analysis of metagenomic data available through the Sequence Read Archive (SRA) showed that the level of ß-glucuronidase in the gut metagenomes was higher in males than in females, which provides a potential explanation for the sex-based differences in efficacy and toxicity for several drugs, reported in previous studies. Our analysis also showed that infant gut metagenomes at birth and 12 months of age have higher levels of ß-glucuronidase than the metagenomes of their mothers and the implication of this observed variability was discussed in the context of breastfeeding as well as infant hyperbilirubinemia. Overall, despite important limitations discussed in this paper, our analysis provided useful insights on the role of the human gut metagenome in the variability in drug response among individuals. Importantly, this approach exploits drug and metagenome data available in public databases as well as open-source cheminformatics and bioinformatics tools to predict drug-metagenome interactions.


Asunto(s)
Predicción/métodos , Microbioma Gastrointestinal/efectos de los fármacos , Metagenómica/métodos , Adulto , Bacterias/genética , Biología Computacional/métodos , Manejo de Datos , Femenino , Microbioma Gastrointestinal/genética , Glucuronidasa/genética , Glucuronidasa/metabolismo , Humanos , Recién Nacido , Masculino , Metagenoma/efectos de los fármacos , Metagenoma/genética , Microbiota/efectos de los fármacos , Microbiota/genética , Madres
14.
Biochemistry ; 49(46): 9948-56, 2010 Nov 23.
Artículo en Inglés | MEDLINE | ID: mdl-20949908

RESUMEN

Obscurin A, an ∼720 kDa modular protein of striated muscles, binds to small ankyrin 1 (sAnk1, Ank 1.5), an integral protein of the sarcoplasmic reticulum, through two distinct carboxy-terminal sequences, Obsc(6316-6436) and Obsc(6236-6260). We hypothesized that these sequences differ in affinity but that they compete for the same binding site on sAnk1. We show that the sequence within Obsc(6316-6436) that binds to sAnk1 is limited to residues 6316-6345. Comparison of Obsc(6231-6260) to Obsc(6316-6345) reveals that Obsc(6316-6345) binds sAnk1 with an affinity (133 ± 43 nM) comparable to that of the Obsc(6316-6436) fusion protein, whereas Obsc(6231-6260) binds with lower affinity (384 ± 53 nM). Oligopeptides of each sequence compete for binding with both sites at half-maximal inhibitory concentrations consistent with the affinities measured directly. Five of six site-directed mutants of sAnk1 showed similar reductions in binding to each binding site on obscurin, suggesting that they dock to many of the same residues of sAnk1. Circular dichroism (CD) analysis of the synthetic oligopeptides revealed a 2-fold greater α-helical content in Obsc(6316-6346), ∼35%, than Obsc(6231-6260,) ∼17%. Using these data, structural prediction algorithms, and homology modeling, we predict that Obsc(6316-6345) contains a bent α-helix of 12 amino acids, flanked by short disordered regions, and that Obsc(6231-6260) has a short, N-terminal α-helix of 4-5 residues followed by a long disordered region. Our results are consistent with a model in which both sequences of obscurin differ significantly in structure but bind to the ankyrin-like repeat motifs of sAnk1 in a similar though not identical manner.


Asunto(s)
Ancirinas/química , Proteínas Musculares/química , Algoritmos , Secuencia de Aminoácidos , Ancirinas/metabolismo , Sitios de Unión , Células Cultivadas , Dicroismo Circular , Datos de Secuencia Molecular , Proteínas Musculares/metabolismo , Retículo Sarcoplasmático/metabolismo
15.
F1000Res ; 9: 376, 2020.
Artículo en Inglés | MEDLINE | ID: mdl-32864105

RESUMEN

The Sequence Read Archive (SRA) is a large public repository that stores raw next-generation sequencing data from thousands of diverse scientific investigations.  Despite its promise, reuse and re-analysis of SRA data has been challenged by the heterogeneity and poor quality of the metadata that describe its biological samples. Recently, the MetaSRA project standardized these metadata by annotating each sample with terms from biomedical ontologies. In this work, we present a pair of Jupyter notebook-based tools that utilize the MetaSRA for building structured datasets from the SRA in order to facilitate secondary analyses of the SRA's human RNA-seq data. The first tool, called the Case-Control Finder, finds suitable case and control samples for a given disease or condition where the cases and controls are matched by tissue or cell type.  The second tool, called the Series Finder, finds ordered sets of samples for the purpose of addressing biological questions pertaining to changes over a numerical property such as time. These tools were the result of a three-day-long NCBI Codeathon in March 2019 held at the University of North Carolina at Chapel Hill.


Asunto(s)
Ontologías Biológicas , Conjuntos de Datos como Asunto , Secuenciación de Nucleótidos de Alto Rendimiento , Metadatos , Programas Informáticos , Estudios de Casos y Controles , Humanos , RNA-Seq
16.
JCO Clin Cancer Inform ; 4: 310-317, 2020 03.
Artículo en Inglés | MEDLINE | ID: mdl-32228266

RESUMEN

PURPOSE: The modern researcher is confronted with hundreds of published methods to interpret genetic variants. There are databases of genes and variants, phenotype-genotype relationships, algorithms that score and rank genes, and in silico variant effect prediction tools. Because variant prioritization is a multifactorial problem, a welcome development in the field has been the emergence of decision support frameworks, which make it easier to integrate multiple resources in an interactive environment. Current decision support frameworks are typically limited by closed proprietary architectures, access to a restricted set of tools, lack of customizability, Web dependencies that expose protected data, or limited scalability. METHODS: We present the Open Custom Ranked Analysis of Variants Toolkit1 (OpenCRAVAT) a new open-source, scalable decision support system for variant and gene prioritization. We have designed the resource catalog to be open and modular to maximize community and developer involvement, and as a result, the catalog is being actively developed and growing every month. Resources made available via the store are well suited for analysis of cancer, as well as Mendelian and complex diseases. RESULTS: OpenCRAVAT offers both command-line utility and dynamic graphical user interface, allowing users to install with a single command, easily download tools from an extensive resource catalog, create customized pipelines, and explore results in a richly detailed viewing environment. We present several case studies to illustrate the design of custom workflows to prioritize genes and variants. CONCLUSION: OpenCRAVAT is distinguished from similar tools by its capabilities to access and integrate an unprecedented amount of diverse data resources and computational prediction methods, which span germline, somatic, common, rare, coding, and noncoding variants.


Asunto(s)
Biología Computacional/organización & administración , Bases de Datos Genéticas/normas , Mutación , Proteínas de Neoplasias/genética , Neoplasias/genética , Programas Informáticos/normas , Humanos , Neoplasias/diagnóstico , Neoplasias/tratamiento farmacológico , Interfaz Usuario-Computador , Flujo de Trabajo
17.
Viruses ; 12(12)2020 12 10.
Artículo en Inglés | MEDLINE | ID: mdl-33322070

RESUMEN

Viruses represent important test cases for data federation due to their genome size and the rapid increase in sequence data in publicly available databases. However, some consequences of previously decentralized (unfederated) data are lack of consensus or comparisons between feature annotations. Unifying or displaying alternative annotations should be a priority both for communities with robust entry representation and for nascent communities with burgeoning data sources. To this end, during this three-day continuation of the Virus Hunting Toolkit codeathon series (VHT-2), a new integrated and federated viral index was elaborated. This Federated Index of Viral Experiments (FIVE) integrates pre-existing and novel functional and taxonomy annotations and virus-host pairings. Variability in the context of viral genomic diversity is often overlooked in virus databases. As a proof-of-concept, FIVE was the first attempt to include viral genome variation for HIV, the most well-studied human pathogen, through viral genome diversity graphs. As per the publication of this manuscript, FIVE is the first implementation of a virus-specific federated index of such scope. FIVE is coded in BigQuery for optimal access of large quantities of data and is publicly accessible. Many projects of database or index federation fail to provide easier alternatives to access or query information. To this end, a Python API query system was developed to enhance the accessibility of FIVE.


Asunto(s)
Biología Computacional , Bases de Datos Genéticas , Metagenómica/métodos , Virus/genética , Biología Computacional/métodos , Variación Genética , Genoma Viral , Interacciones Huésped-Patógeno , Humanos , Interfaz Usuario-Computador , Proteínas Virales/genética , Proteínas Virales/metabolismo , Virus/metabolismo , Navegador Web
18.
Gigascience ; 8(7)2019 07 01.
Artículo en Inglés | MEDLINE | ID: mdl-31257419

RESUMEN

BACKGROUND: Anthozoa, Endocnidozoa, and Medusozoa are the 3 major clades of Cnidaria. Medusozoa is further divided into 4 clades, Hydrozoa, Staurozoa, Cubozoa, and Scyphozoa-the latter 3 lineages make up the clade Acraspeda. Acraspeda encompasses extraordinary diversity in terms of life history, numerous nuisance species, taxa with complex eyes rivaling other animals, and some of the most venomous organisms on the planet. Genomes have recently become available within Scyphozoa and Cubozoa, but there are currently no published genomes within Staurozoa and Cubozoa. FINDINGS: Here we present 3 new draft genomes of Calvadosia cruxmelitensis (Staurozoa), Alatina alata (Cubozoa), and Cassiopea xamachana (Scyphozoa) for which we provide a preliminary orthology analysis that includes an inventory of their respective venom-related genes. Additionally, we identify synteny between POU and Hox genes that had previously been reported in a hydrozoan, suggesting this linkage is highly conserved, possibly dating back to at least the last common ancestor of Medusozoa, yet likely independent of vertebrate POU-Hox linkages. CONCLUSIONS: These draft genomes provide a valuable resource for studying the evolutionary history and biology of these extraordinary animals, and for identifying genomic features underlying venom, vision, and life history traits in Acraspeda.


Asunto(s)
Cnidarios/genética , Genoma , Animales , Cnidarios/clasificación , Venenos de Cnidarios/genética , Venenos de Cnidarios/metabolismo , Filogenia , Sintenía , Transcriptoma
19.
F1000Res ; 8: 1135, 2019.
Artículo en Inglés | MEDLINE | ID: mdl-31824661

RESUMEN

Background: Basic and clinical scientific research at the University of South Florida (USF) have intersected to support a multi-faceted approach around a common focus on rare iron-related diseases. We proposed a modified version of the National Center for Biotechnology Information's (NCBI) Hackathon-model to take full advantage of local expertise in building "Iron Hack", a rare disease-focused hackathon. As the collaborative, problem-solving nature of hackathons tends to attract participants of highly-diverse backgrounds, organizers facilitated a symposium on rare iron-related diseases, specifically porphyrias and Friedreich's ataxia, pitched at general audiences. Methods: The hackathon was structured to begin each day with presentations by expert clinicians, genetic counselors, researchers focused on molecular and cellular biology, public health/global health, genetics/genomics, computational biology, bioinformatics, biomolecular science, bioengineering, and computer science, as well as guest speakers from the American Porphyria Foundation (APF) and Friedreich's Ataxia Research Alliance (FARA) to inform participants as to the human impact of these diseases. Results: As a result of this hackathon, we developed resources that are relevant not only to these specific disease-models, but also to other rare diseases and general bioinformatics problems. Within two and a half days, "Iron Hack" participants successfully built collaborative projects to visualize data, build databases, improve rare disease diagnosis, and study rare-disease inheritance. Conclusions: The purpose of this manuscript is to demonstrate the utility of a hackathon model to generate prototypes of generalizable tools for a given disease and train clinicians and data scientists to interact more effectively.


Asunto(s)
Ataxia de Friedreich , Porfirias , Bases de Datos Factuales , Humanos , Hierro , Enfermedades Raras , Estados Unidos
20.
Genes (Basel) ; 10(9)2019 09 16.
Artículo en Inglés | MEDLINE | ID: mdl-31527408

RESUMEN

A wealth of viral data sits untapped in publicly available metagenomic data sets when it might be extracted to create a usable index for the virological research community. We hypothesized that work of this complexity and scale could be done in a hackathon setting. Ten teams comprised of over 40 participants from six countries, assembled to create a crowd-sourced set of analysis and processing pipelines for a complex biological data set in a three-day event on the San Diego State University campus starting 9 January 2019. Prior to the hackathon, 141,676 metagenomic data sets from the National Center for Biotechnology Information (NCBI) Sequence Read Archive (SRA) were pre-assembled into contiguous assemblies (contigs) by NCBI staff. During the hackathon, a subset consisting of 2953 SRA data sets (approximately 55 million contigs) was selected, which were further filtered for a minimal length of 1 kb. This resulted in 4.2 million (Mio) contigs, which were aligned using BLAST against all known virus genomes, phylogenetically clustered and assigned metadata. Out of the 4.2 Mio contigs, 360,000 contigs were labeled with domains and an additional subset containing 4400 contigs was screened for virus or virus-like genes. The work yielded valuable insights into both SRA data and the cloud infrastructure required to support such efforts, revealing analysis bottlenecks and possible workarounds thereof. Mainly: (i) Conservative assemblies of SRA data improves initial analysis steps; (ii) existing bioinformatic software with weak multithreading/multicore support can be elevated by wrapper scripts to use all cores within a computing node; (iii) redesigning existing bioinformatic algorithms for a cloud infrastructure to facilitate its use for a wider audience; and (iv) a cloud infrastructure allows a diverse group of researchers to collaborate effectively. The scientific findings will be extended during a follow-up event. Here, we present the applied workflows, initial results, and lessons learned from the hackathon.


Asunto(s)
Nube Computacional/normas , Genoma Viral , Metagenoma , Metagenómica/métodos , Macrodatos , Genoma Humano , Humanos , Metagenómica/normas , Programas Informáticos
SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA