Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 14 de 14
Filtrar
1.
Nat Methods ; 15(8): 591-594, 2018 08.
Artículo en Inglés | MEDLINE | ID: mdl-30013048

RESUMEN

We describe Strelka2 ( https://github.com/Illumina/strelka ), an open-source small-variant-calling method for research and clinical germline and somatic sequencing applications. Strelka2 introduces a novel mixture-model-based estimation of insertion/deletion error parameters from each sample, an efficient tiered haplotype-modeling strategy, and a normal sample contamination model to improve liquid tumor analysis. For both germline and somatic calling, Strelka2 substantially outperformed the current leading tools in terms of both variant-calling accuracy and computing cost.


Asunto(s)
Variación Genética , Mutación de Línea Germinal , Programas Informáticos , Bases de Datos Genéticas/estadística & datos numéricos , Haplotipos , Secuenciación de Nucleótidos de Alto Rendimiento/estadística & datos numéricos , Humanos , Mutación INDEL , Modelos Genéticos , Neoplasias/genética , Secuenciación Completa del Genoma/estadística & datos numéricos
2.
Genome Res ; 27(1): 157-164, 2017 01.
Artículo en Inglés | MEDLINE | ID: mdl-27903644

RESUMEN

Improvement of variant calling in next-generation sequence data requires a comprehensive, genome-wide catalog of high-confidence variants called in a set of genomes for use as a benchmark. We generated deep, whole-genome sequence data of 17 individuals in a three-generation pedigree and called variants in each genome using a range of currently available algorithms. We used haplotype transmission information to create a phased "Platinum" variant catalog of 4.7 million single-nucleotide variants (SNVs) plus 0.7 million small (1-50 bp) insertions and deletions (indels) that are consistent with the pattern of inheritance in the parents and 11 children of this pedigree. Platinum genotypes are highly concordant with the current catalog of the National Institute of Standards and Technology for both SNVs (>99.99%) and indels (99.92%) and add a validated truth catalog that has 26% more SNVs and 45% more indels. Analysis of 334,652 SNVs that were consistent between informatics pipelines yet inconsistent with haplotype transmission ("nonplatinum") revealed that the majority of these variants are de novo and cell-line mutations or reside within previously unidentified duplications and deletions. The reference materials from this study are a resource for objective assessment of the accuracy of variant calls throughout genomes.


Asunto(s)
Genoma Humano/genética , Genómica , Secuenciación de Nucleótidos de Alto Rendimiento/métodos , Análisis de Secuencia de ADN/métodos , Algoritmos , Bases de Datos Genéticas , Exoma/genética , Genotipo , Humanos , Mutación INDEL/genética , Linaje , Polimorfismo de Nucleótido Simple , Programas Informáticos
3.
Mol Cell ; 46(2): 226-37, 2012 Apr 27.
Artículo en Inglés | MEDLINE | ID: mdl-22445486

RESUMEN

Emerging evidence indicates that membrane lipids regulate protein networking by directly interacting with protein-interaction domains (PIDs). As a pilot study to identify and functionally annodate lipid-binding PIDs on a genomic scale, we performed experimental and computational studies of PDZ domains. Characterization of 70 PDZ domains showed that ~40% had submicromolar membrane affinity. Using a computational model built from these data, we predicted the membrane-binding properties of 2,000 PDZ domains from 20 species. The accuracy of the prediction was experimentally validated for 26 PDZ domains. We also subdivided lipid-binding PDZ domains into three classes based on the interplay between membrane- and protein-binding sites. For different classes of PDZ domains, lipid binding regulates their protein interactions by different mechanisms. Functional studies of a PDZ domain protein, rhophilin 2, suggest that all classes of lipid-binding PDZ domains serve as genuine dual-specificity modules regulating protein interactions at the membrane under physiological conditions.


Asunto(s)
Simulación por Computador , Metabolismo de los Lípidos , Dominios y Motivos de Interacción de Proteínas , Animales , Genoma , Humanos , Lípidos/química , Proteínas de la Membrana/química , Proteínas de la Membrana/metabolismo , Ratones , Modelos Moleculares , Ratas , Resonancia por Plasmón de Superficie
4.
Bioinformatics ; 32(8): 1220-2, 2016 04 15.
Artículo en Inglés | MEDLINE | ID: mdl-26647377

RESUMEN

UNLABELLED: : We describe Manta, a method to discover structural variants and indels from next generation sequencing data. Manta is optimized for rapid germline and somatic analysis, calling structural variants, medium-sized indels and large insertions on standard compute hardware in less than a tenth of the time that comparable methods require to identify only subsets of these variant types: for example NA12878 at 50× genomic coverage is analyzed in less than 20 min. Manta can discover and score variants based on supporting paired and split-read evidence, with scoring models optimized for germline analysis of diploid individuals and somatic analysis of tumor-normal sample pairs. Call quality is similar to or better than comparable methods, as determined by pedigree consistency of germline calls and comparison of somatic calls to COSMIC database variants. Manta consistently assembles a higher fraction of its calls to base-pair resolution, allowing for improved downstream annotation and analysis of clinical significance. We provide Manta as a community resource to facilitate practical and routine structural variant analysis in clinical and research sequencing scenarios. AVAILABILITY AND IMPLEMENTATION: Manta is released under the open-source GPLv3 license. Source code, documentation and Linux binaries are available from https://github.com/Illumina/manta. CONTACT: csaunders@illumina.com SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Asunto(s)
Secuenciación de Nucleótidos de Alto Rendimiento , Mutación INDEL , Neoplasias/genética , ADN de Neoplasias , Genoma , Genómica , Humanos , Programas Informáticos
5.
Bioinformatics ; 29(16): 2041-3, 2013 Aug 15.
Artículo en Inglés | MEDLINE | ID: mdl-23736529

RESUMEN

SUMMARY: An ultrafast DNA sequence aligner (Isaac Genome Alignment Software) that takes advantage of high-memory hardware (>48 GB) and variant caller (Isaac Variant Caller) have been developed. We demonstrate that our combined pipeline (Isaac) is four to five times faster than BWA + GATK on equivalent hardware, with comparable accuracy as measured by trio conflict rates and sensitivity. We further show that Isaac is effective in the detection of disease-causing variants and can easily/economically be run on commodity hardware. AVAILABILITY: Isaac has an open source license and can be obtained at https://github.com/sequencing.


Asunto(s)
Secuenciación de Nucleótidos de Alto Rendimiento/métodos , Alineación de Secuencia/métodos , Análisis de Secuencia de ADN/métodos , Programas Informáticos , Variación Genética , Genoma Humano , Humanos
6.
Bioinformatics ; 28(18): i431-i437, 2012 Sep 15.
Artículo en Inglés | MEDLINE | ID: mdl-22962463

RESUMEN

MOTIVATION: Peripheral membrane-targeting domain (MTD) families, such as C1-, C2- and PH domains, play a key role in signal transduction and membrane trafficking by dynamically translocating their parent proteins to specific plasma membranes when changes in lipid composition occur. It is, however, difficult to determine the subset of domains within families displaying this property, as sequence motifs signifying the membrane binding properties are not well defined. For this reason, procedures based on sequence similarity alone are often insufficient in computational identification of MTDs within families (yielding less than 65% accuracy even with a sequence identity of 70%). RESULTS: We present a machine learning protocol for determining membrane-targeting properties achieving 85-90% accuracy in separating binding and non-binding domains within families. Our model is based on features from both sequence and structure, thereby incorporation statistics obtained from the entire domain family and domain-specific physical quantities such as surface electrostatics. In addition, by using the enriched rules in alternating decision tree classifiers, we are able to determine the meaning of the assigned function labels in terms of biological mechanisms. CONCLUSIONS: The high accuracy of the learned models and good agreement between the rules discovered using the ADtree classifier and mechanisms reported in the literature reflect the value of machine learning protocols in both prediction and biological knowledge discovery. Our protocol can thus potentially be used as a general function annotation and knowledge mining tool for other protein domains. AVAILABILITY: metador.bioengr.uic.edu CONTACT: huilu@uic.edu.


Asunto(s)
Inteligencia Artificial , Proteínas de la Membrana/química , Proteínas de la Membrana/clasificación , Modelos Moleculares , Proteína Quinasa C-delta/química , Señales de Clasificación de Proteína , Estructura Terciaria de Proteína , Electricidad Estática
7.
J Biol Chem ; 285(1): 531-40, 2010 Jan 01.
Artículo en Inglés | MEDLINE | ID: mdl-19880963

RESUMEN

The mechanisms by which cytosolic proteins reversibly bind the membrane and induce the curvature for membrane trafficking and remodeling remain elusive. The epsin N-terminal homology (ENTH) domain has potent vesicle tubulation activity despite a lack of intrinsic molecular curvature. EPR revealed that the N-terminal alpha-helix penetrates the phosphatidylinositol 4,5-bisphosphate-containing membrane at a unique oblique angle and concomitantly interacts closely with helices from neighboring molecules in an antiparallel orientation. The quantitative fluorescence microscopy showed that the formation of highly ordered ENTH domain complexes beyond a critical size is essential for its vesicle tubulation activity. The mutations that interfere with the formation of large ENTH domain complexes abrogated the vesicle tubulation activity. Furthermore, the same mutations in the intact epsin 1 abolished its endocytic activity in mammalian cells. Collectively, these results show that the ENTH domain facilitates the cellular membrane budding and fission by a novel mechanism that is distinct from that proposed for BAR domains.


Asunto(s)
Proteínas Adaptadoras del Transporte Vesicular/química , Proteínas Adaptadoras del Transporte Vesicular/metabolismo , Membrana Celular/metabolismo , Modelos Moleculares , Animales , Línea Celular , Espectroscopía de Resonancia por Spin del Electrón , Endocitosis , Ratones , Proteínas Mutantes/química , Proteínas Mutantes/metabolismo , Estructura Cuaternaria de Proteína , Estructura Secundaria de Proteína , Estructura Terciaria de Proteína , Relación Estructura-Actividad , Transferrina/metabolismo , Liposomas Unilamelares/metabolismo
8.
BMC Bioinformatics ; 11: 591, 2010 Dec 07.
Artículo en Inglés | MEDLINE | ID: mdl-21138573

RESUMEN

BACKGROUND: Mass spectrometry has become a standard method by which the proteomic profile of cell or tissue samples is characterized. To fully take advantage of tandem mass spectrometry (MS/MS) techniques in large scale protein characterization studies robust and consistent data analysis procedures are crucial. In this work we present a machine learning based protocol for the identification of correct peptide-spectrum matches from Sequest database search results, improving on previously published protocols. RESULTS: The developed model improves on published machine learning classification procedures by 6% as measured by the area under the ROC curve. Further, we show how the developed model can be presented as an interpretable tree of additive rules, thereby effectively removing the 'black-box' notion often associated with machine learning classifiers, allowing for comparison with expert rule-of-thumb. Finally, a method for extending the developed peptide identification protocol to give probabilistic estimates of the presence of a given protein is proposed and tested. CONCLUSIONS: We demonstrate the construction of a high accuracy classification model for Sequest search results from MS/MS spectra obtained by using the MALDI ionization. The developed model performs well in identifying correct peptide-spectrum matches and is easily extendable to the protein identification problem. The relative ease with which additional experimental parameters can be incorporated into the classification framework, to give additional discriminatory power, allows for future tailoring of the model to take advantage of information from specific instrument set-ups.


Asunto(s)
Inteligencia Artificial , Espectrometría de Masas/métodos , Proteínas/química , Proteómica/métodos , Bases de Datos de Proteínas , Péptidos/química , Programas Informáticos
9.
Nat Genet ; 51(2): 354-362, 2019 02.
Artículo en Inglés | MEDLINE | ID: mdl-30643257

RESUMEN

The human reference genome serves as the foundation for genomics by providing a scaffold for alignment of sequencing reads, but currently only reflects a single consensus haplotype, thus impairing analysis accuracy. Here we present a graph reference genome implementation that enables read alignment across 2,800 diploid genomes encompassing 12.6 million SNPs and 4.0 million insertions and deletions (indels). The pipeline processes one whole-genome sequencing sample in 6.5 h using a system with 36 CPU cores. We show that using a graph genome reference improves read mapping sensitivity and produces a 0.5% increase in variant calling recall, with unaffected specificity. Structural variations incorporated into a graph genome can be genotyped accurately under a unified framework. Finally, we show that iterative augmentation of graph genomes yields incremental gains in variant calling accuracy. Our implementation is an important advance toward fulfilling the promise of graph genomes to radically enhance the scalability and accuracy of genomic analyses.


Asunto(s)
Genoma Humano/genética , Genómica/métodos , Humanos , Polimorfismo de Nucleótido Simple/genética , Alineación de Secuencia/métodos , Análisis de Secuencia de ADN/métodos , Eliminación de Secuencia/genética , Secuenciación Completa del Genoma/métodos
10.
Methods Mol Biol ; 1137: 17-27, 2014.
Artículo en Inglés | MEDLINE | ID: mdl-24573471

RESUMEN

Assigning functional properties to a newly discovered protein is a key challenge in modern biology. To this end, computational modeling of the three-dimensional atomic arrangement of the amino acid chain is often crucial in determining the role of the protein in biological processes. We present a community-wide web-based protocol, RaptorX server ( http://raptorx.uchicago.edu ), for automated protein secondary structure prediction, template-based tertiary structure modeling, and probabilistic alignment sampling.Given a target sequence, RaptorX server is able to detect even remotely related template sequences by means of a novel nonlinear context-specific alignment potential and probabilistic consistency algorithm. Using the protocol presented here it is thus possible to obtain high-quality structural models for many target protein sequences when only distantly related protein domains have experimentally solved structures. At present, RaptorX server can perform secondary and tertiary structure prediction of a 200 amino acid target sequence in approximately 30 min.


Asunto(s)
Modelos Moleculares , Conformación Proteica , Proteínas/química , Programas Informáticos , Navegador Web
11.
Nat Protoc ; 7(8): 1511-22, 2012 Jul 19.
Artículo en Inglés | MEDLINE | ID: mdl-22814390

RESUMEN

A key challenge of modern biology is to uncover the functional role of the protein entities that compose cellular proteomes. To this end, the availability of reliable three-dimensional atomic models of proteins is often crucial. This protocol presents a community-wide web-based method using RaptorX (http://raptorx.uchicago.edu/) for protein secondary structure prediction, template-based tertiary structure modeling, alignment quality assessment and sophisticated probabilistic alignment sampling. RaptorX distinguishes itself from other servers by the quality of the alignment between a target sequence and one or multiple distantly related template proteins (especially those with sparse sequence profiles) and by a novel nonlinear scoring function and a probabilistic-consistency algorithm. Consequently, RaptorX delivers high-quality structural models for many targets with only remote templates. At present, it takes RaptorX ~35 min to finish processing a sequence of 200 amino acids. Since its official release in August 2011, RaptorX has processed ~6,000 sequences submitted by ~1,600 users from around the world.


Asunto(s)
Modelos Moleculares , Proteínas/química , Programas Informáticos , Biología Computacional/métodos , Conformación Proteica , Estructura Secundaria de Proteína , Estructura Terciaria de Proteína
12.
Nat Commun ; 3: 1249, 2012.
Artículo en Inglés | MEDLINE | ID: mdl-23212378

RESUMEN

Cholesterol is known to modulate the physical properties of cell membranes, but its direct involvement in cellular signaling has not been thoroughly investigated. Here we show that cholesterol specifically binds many PDZ domains found in scaffold proteins, including the N-terminal PDZ domain of NHERF1/EBP50. This modular domain has a cholesterol-binding site topologically distinct from its canonical protein-binding site and serves as a dual-specificity domain that bridges the membrane and juxta-membrane signaling complexes. Disruption of the cholesterol-binding activity of NHERF1 largely abrogates its dynamic co-localization with and activation of cystic fibrosis transmembrane conductance regulator, one of its binding partners in the plasma membrane of mammalian cells. At least seven more PDZ domains from other scaffold proteins also bind cholesterol and have cholesterol-binding sites, suggesting that cholesterol modulates cell signaling through direct interactions with these scaffold proteins. This mechanism may provide an alternative explanation for the formation of signaling platforms in cholesterol-rich membrane domains.


Asunto(s)
Colesterol/fisiología , Dominios PDZ/fisiología , Transducción de Señal/fisiología , Sitios de Unión , Canales de Cloruro/fisiología , Polarización de Fluorescencia , Células HEK293/fisiología , Humanos , Regiones de Fijación a la Matriz/fisiología , Microscopía Confocal , Imagen Molecular , Fosfoproteínas/fisiología , Intercambiadores de Sodio-Hidrógeno/fisiología
13.
Artículo en Inglés | MEDLINE | ID: mdl-19963936

RESUMEN

Machine learning based classification protocols for automated function annotation of protein structures have in many instances proven superior to simpler sequence based procedures. Here we present an automated method for extracting features from protein structures by construction of surface patches to be used in such protocols. The utility of the developed patch-growing procedure is exemplified by its ability to identify reversible membrane binding domains from the C1, C2, and PH families.


Asunto(s)
Inteligencia Artificial , Membrana Celular/metabolismo , Proteínas/química , Algoritmos , Automatización , Bases de Datos de Proteínas , Enlace de Hidrógeno , Interacciones Hidrofóbicas e Hidrofílicas , Unión Proteica , Estructura Terciaria de Proteína , Solventes , Electricidad Estática
14.
Cell Biochem Biophys ; 55(3): 141-52, 2009.
Artículo en Inglés | MEDLINE | ID: mdl-19669741

RESUMEN

Efficient communication between the cell and its external environment is of the utmost importance to the function of multicellular organisms. While signaling events can be generally characterized as information exchange by means of controlled energy conversion, research efforts have hitherto mainly been concerned with mechanisms involving chemical and electrical energy transfer. Here, we review recent computational efforts addressing the function of mechanical force in signal transduction. Specifically, we focus on the role of steered molecular dynamics (SMD) simulations in providing details at the atomic level on a group of protein domains, which play a fundamental role in signal exchange by responding properly to mechanical strain. We start by giving a brief introduction to the SMD technique and general properties of mechanically stable protein folds, followed by specific examples illustrating three general regimes of signal transfer utilizing mechanical energy: purely mechanical, mechanical to chemical, and chemical to mechanical. Whenever possible the physiological importance of the example at hand is stressed to highlight the diversity of the processes in which mechanical signaling plays a key role. We also provide an overview of future challenges and perspectives for this rapidly developing field.


Asunto(s)
Transferencia de Energía , Mecanotransducción Celular , Pliegue de Proteína , Animales , Humanos , Simulación de Dinámica Molecular
SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA