Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 14 de 14
Filtrar
1.
Nat Methods ; 15(8): 591-594, 2018 08.
Artigo em Inglês | MEDLINE | ID: mdl-30013048

RESUMO

We describe Strelka2 ( https://github.com/Illumina/strelka ), an open-source small-variant-calling method for research and clinical germline and somatic sequencing applications. Strelka2 introduces a novel mixture-model-based estimation of insertion/deletion error parameters from each sample, an efficient tiered haplotype-modeling strategy, and a normal sample contamination model to improve liquid tumor analysis. For both germline and somatic calling, Strelka2 substantially outperformed the current leading tools in terms of both variant-calling accuracy and computing cost.


Assuntos
Variação Genética , Mutação em Linhagem Germinativa , Software , Bases de Dados Genéticas/estatística & dados numéricos , Haplótipos , Sequenciamento de Nucleotídeos em Larga Escala/estatística & dados numéricos , Humanos , Mutação INDEL , Modelos Genéticos , Neoplasias/genética , Sequenciamento Completo do Genoma/estatística & dados numéricos
2.
Genome Res ; 27(1): 157-164, 2017 01.
Artigo em Inglês | MEDLINE | ID: mdl-27903644

RESUMO

Improvement of variant calling in next-generation sequence data requires a comprehensive, genome-wide catalog of high-confidence variants called in a set of genomes for use as a benchmark. We generated deep, whole-genome sequence data of 17 individuals in a three-generation pedigree and called variants in each genome using a range of currently available algorithms. We used haplotype transmission information to create a phased "Platinum" variant catalog of 4.7 million single-nucleotide variants (SNVs) plus 0.7 million small (1-50 bp) insertions and deletions (indels) that are consistent with the pattern of inheritance in the parents and 11 children of this pedigree. Platinum genotypes are highly concordant with the current catalog of the National Institute of Standards and Technology for both SNVs (>99.99%) and indels (99.92%) and add a validated truth catalog that has 26% more SNVs and 45% more indels. Analysis of 334,652 SNVs that were consistent between informatics pipelines yet inconsistent with haplotype transmission ("nonplatinum") revealed that the majority of these variants are de novo and cell-line mutations or reside within previously unidentified duplications and deletions. The reference materials from this study are a resource for objective assessment of the accuracy of variant calls throughout genomes.


Assuntos
Genoma Humano/genética , Genômica , Sequenciamento de Nucleotídeos em Larga Escala/métodos , Análise de Sequência de DNA/métodos , Algoritmos , Bases de Dados Genéticas , Exoma/genética , Genótipo , Humanos , Mutação INDEL/genética , Linhagem , Polimorfismo de Nucleotídeo Único , Software
3.
Mol Cell ; 46(2): 226-37, 2012 Apr 27.
Artigo em Inglês | MEDLINE | ID: mdl-22445486

RESUMO

Emerging evidence indicates that membrane lipids regulate protein networking by directly interacting with protein-interaction domains (PIDs). As a pilot study to identify and functionally annodate lipid-binding PIDs on a genomic scale, we performed experimental and computational studies of PDZ domains. Characterization of 70 PDZ domains showed that ~40% had submicromolar membrane affinity. Using a computational model built from these data, we predicted the membrane-binding properties of 2,000 PDZ domains from 20 species. The accuracy of the prediction was experimentally validated for 26 PDZ domains. We also subdivided lipid-binding PDZ domains into three classes based on the interplay between membrane- and protein-binding sites. For different classes of PDZ domains, lipid binding regulates their protein interactions by different mechanisms. Functional studies of a PDZ domain protein, rhophilin 2, suggest that all classes of lipid-binding PDZ domains serve as genuine dual-specificity modules regulating protein interactions at the membrane under physiological conditions.


Assuntos
Simulação por Computador , Metabolismo dos Lipídeos , Domínios e Motivos de Interação entre Proteínas , Animais , Genoma , Humanos , Lipídeos/química , Proteínas de Membrana/química , Proteínas de Membrana/metabolismo , Camundongos , Modelos Moleculares , Ratos , Ressonância de Plasmônio de Superfície
4.
Bioinformatics ; 32(8): 1220-2, 2016 04 15.
Artigo em Inglês | MEDLINE | ID: mdl-26647377

RESUMO

UNLABELLED: : We describe Manta, a method to discover structural variants and indels from next generation sequencing data. Manta is optimized for rapid germline and somatic analysis, calling structural variants, medium-sized indels and large insertions on standard compute hardware in less than a tenth of the time that comparable methods require to identify only subsets of these variant types: for example NA12878 at 50× genomic coverage is analyzed in less than 20 min. Manta can discover and score variants based on supporting paired and split-read evidence, with scoring models optimized for germline analysis of diploid individuals and somatic analysis of tumor-normal sample pairs. Call quality is similar to or better than comparable methods, as determined by pedigree consistency of germline calls and comparison of somatic calls to COSMIC database variants. Manta consistently assembles a higher fraction of its calls to base-pair resolution, allowing for improved downstream annotation and analysis of clinical significance. We provide Manta as a community resource to facilitate practical and routine structural variant analysis in clinical and research sequencing scenarios. AVAILABILITY AND IMPLEMENTATION: Manta is released under the open-source GPLv3 license. Source code, documentation and Linux binaries are available from https://github.com/Illumina/manta. CONTACT: csaunders@illumina.com SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Assuntos
Sequenciamento de Nucleotídeos em Larga Escala , Mutação INDEL , Neoplasias/genética , DNA de Neoplasias , Genoma , Genômica , Humanos , Software
5.
Bioinformatics ; 29(16): 2041-3, 2013 Aug 15.
Artigo em Inglês | MEDLINE | ID: mdl-23736529

RESUMO

SUMMARY: An ultrafast DNA sequence aligner (Isaac Genome Alignment Software) that takes advantage of high-memory hardware (>48 GB) and variant caller (Isaac Variant Caller) have been developed. We demonstrate that our combined pipeline (Isaac) is four to five times faster than BWA + GATK on equivalent hardware, with comparable accuracy as measured by trio conflict rates and sensitivity. We further show that Isaac is effective in the detection of disease-causing variants and can easily/economically be run on commodity hardware. AVAILABILITY: Isaac has an open source license and can be obtained at https://github.com/sequencing.


Assuntos
Sequenciamento de Nucleotídeos em Larga Escala/métodos , Alinhamento de Sequência/métodos , Análise de Sequência de DNA/métodos , Software , Variação Genética , Genoma Humano , Humanos
6.
Bioinformatics ; 28(18): i431-i437, 2012 Sep 15.
Artigo em Inglês | MEDLINE | ID: mdl-22962463

RESUMO

MOTIVATION: Peripheral membrane-targeting domain (MTD) families, such as C1-, C2- and PH domains, play a key role in signal transduction and membrane trafficking by dynamically translocating their parent proteins to specific plasma membranes when changes in lipid composition occur. It is, however, difficult to determine the subset of domains within families displaying this property, as sequence motifs signifying the membrane binding properties are not well defined. For this reason, procedures based on sequence similarity alone are often insufficient in computational identification of MTDs within families (yielding less than 65% accuracy even with a sequence identity of 70%). RESULTS: We present a machine learning protocol for determining membrane-targeting properties achieving 85-90% accuracy in separating binding and non-binding domains within families. Our model is based on features from both sequence and structure, thereby incorporation statistics obtained from the entire domain family and domain-specific physical quantities such as surface electrostatics. In addition, by using the enriched rules in alternating decision tree classifiers, we are able to determine the meaning of the assigned function labels in terms of biological mechanisms. CONCLUSIONS: The high accuracy of the learned models and good agreement between the rules discovered using the ADtree classifier and mechanisms reported in the literature reflect the value of machine learning protocols in both prediction and biological knowledge discovery. Our protocol can thus potentially be used as a general function annotation and knowledge mining tool for other protein domains. AVAILABILITY: metador.bioengr.uic.edu CONTACT: huilu@uic.edu.


Assuntos
Inteligência Artificial , Proteínas de Membrana/química , Proteínas de Membrana/classificação , Modelos Moleculares , Proteína Quinase C-delta/química , Sinais Direcionadores de Proteínas , Estrutura Terciária de Proteína , Eletricidade Estática
7.
J Biol Chem ; 285(1): 531-40, 2010 Jan 01.
Artigo em Inglês | MEDLINE | ID: mdl-19880963

RESUMO

The mechanisms by which cytosolic proteins reversibly bind the membrane and induce the curvature for membrane trafficking and remodeling remain elusive. The epsin N-terminal homology (ENTH) domain has potent vesicle tubulation activity despite a lack of intrinsic molecular curvature. EPR revealed that the N-terminal alpha-helix penetrates the phosphatidylinositol 4,5-bisphosphate-containing membrane at a unique oblique angle and concomitantly interacts closely with helices from neighboring molecules in an antiparallel orientation. The quantitative fluorescence microscopy showed that the formation of highly ordered ENTH domain complexes beyond a critical size is essential for its vesicle tubulation activity. The mutations that interfere with the formation of large ENTH domain complexes abrogated the vesicle tubulation activity. Furthermore, the same mutations in the intact epsin 1 abolished its endocytic activity in mammalian cells. Collectively, these results show that the ENTH domain facilitates the cellular membrane budding and fission by a novel mechanism that is distinct from that proposed for BAR domains.


Assuntos
Proteínas Adaptadoras de Transporte Vesicular/química , Proteínas Adaptadoras de Transporte Vesicular/metabolismo , Membrana Celular/metabolismo , Modelos Moleculares , Animais , Linhagem Celular , Espectroscopia de Ressonância de Spin Eletrônica , Endocitose , Camundongos , Proteínas Mutantes/química , Proteínas Mutantes/metabolismo , Estrutura Quaternária de Proteína , Estrutura Secundária de Proteína , Estrutura Terciária de Proteína , Relação Estrutura-Atividade , Transferrina/metabolismo , Lipossomas Unilamelares/metabolismo
8.
BMC Bioinformatics ; 11: 591, 2010 Dec 07.
Artigo em Inglês | MEDLINE | ID: mdl-21138573

RESUMO

BACKGROUND: Mass spectrometry has become a standard method by which the proteomic profile of cell or tissue samples is characterized. To fully take advantage of tandem mass spectrometry (MS/MS) techniques in large scale protein characterization studies robust and consistent data analysis procedures are crucial. In this work we present a machine learning based protocol for the identification of correct peptide-spectrum matches from Sequest database search results, improving on previously published protocols. RESULTS: The developed model improves on published machine learning classification procedures by 6% as measured by the area under the ROC curve. Further, we show how the developed model can be presented as an interpretable tree of additive rules, thereby effectively removing the 'black-box' notion often associated with machine learning classifiers, allowing for comparison with expert rule-of-thumb. Finally, a method for extending the developed peptide identification protocol to give probabilistic estimates of the presence of a given protein is proposed and tested. CONCLUSIONS: We demonstrate the construction of a high accuracy classification model for Sequest search results from MS/MS spectra obtained by using the MALDI ionization. The developed model performs well in identifying correct peptide-spectrum matches and is easily extendable to the protein identification problem. The relative ease with which additional experimental parameters can be incorporated into the classification framework, to give additional discriminatory power, allows for future tailoring of the model to take advantage of information from specific instrument set-ups.


Assuntos
Inteligência Artificial , Espectrometria de Massas/métodos , Proteínas/química , Proteômica/métodos , Bases de Dados de Proteínas , Peptídeos/química , Software
9.
Nat Genet ; 51(2): 354-362, 2019 02.
Artigo em Inglês | MEDLINE | ID: mdl-30643257

RESUMO

The human reference genome serves as the foundation for genomics by providing a scaffold for alignment of sequencing reads, but currently only reflects a single consensus haplotype, thus impairing analysis accuracy. Here we present a graph reference genome implementation that enables read alignment across 2,800 diploid genomes encompassing 12.6 million SNPs and 4.0 million insertions and deletions (indels). The pipeline processes one whole-genome sequencing sample in 6.5 h using a system with 36 CPU cores. We show that using a graph genome reference improves read mapping sensitivity and produces a 0.5% increase in variant calling recall, with unaffected specificity. Structural variations incorporated into a graph genome can be genotyped accurately under a unified framework. Finally, we show that iterative augmentation of graph genomes yields incremental gains in variant calling accuracy. Our implementation is an important advance toward fulfilling the promise of graph genomes to radically enhance the scalability and accuracy of genomic analyses.


Assuntos
Genoma Humano/genética , Genômica/métodos , Humanos , Polimorfismo de Nucleotídeo Único/genética , Alinhamento de Sequência/métodos , Análise de Sequência de DNA/métodos , Deleção de Sequência/genética , Sequenciamento Completo do Genoma/métodos
10.
Methods Mol Biol ; 1137: 17-27, 2014.
Artigo em Inglês | MEDLINE | ID: mdl-24573471

RESUMO

Assigning functional properties to a newly discovered protein is a key challenge in modern biology. To this end, computational modeling of the three-dimensional atomic arrangement of the amino acid chain is often crucial in determining the role of the protein in biological processes. We present a community-wide web-based protocol, RaptorX server ( http://raptorx.uchicago.edu ), for automated protein secondary structure prediction, template-based tertiary structure modeling, and probabilistic alignment sampling.Given a target sequence, RaptorX server is able to detect even remotely related template sequences by means of a novel nonlinear context-specific alignment potential and probabilistic consistency algorithm. Using the protocol presented here it is thus possible to obtain high-quality structural models for many target protein sequences when only distantly related protein domains have experimentally solved structures. At present, RaptorX server can perform secondary and tertiary structure prediction of a 200 amino acid target sequence in approximately 30 min.


Assuntos
Modelos Moleculares , Conformação Proteica , Proteínas/química , Software , Navegador
11.
Nat Protoc ; 7(8): 1511-22, 2012 Jul 19.
Artigo em Inglês | MEDLINE | ID: mdl-22814390

RESUMO

A key challenge of modern biology is to uncover the functional role of the protein entities that compose cellular proteomes. To this end, the availability of reliable three-dimensional atomic models of proteins is often crucial. This protocol presents a community-wide web-based method using RaptorX (http://raptorx.uchicago.edu/) for protein secondary structure prediction, template-based tertiary structure modeling, alignment quality assessment and sophisticated probabilistic alignment sampling. RaptorX distinguishes itself from other servers by the quality of the alignment between a target sequence and one or multiple distantly related template proteins (especially those with sparse sequence profiles) and by a novel nonlinear scoring function and a probabilistic-consistency algorithm. Consequently, RaptorX delivers high-quality structural models for many targets with only remote templates. At present, it takes RaptorX ~35 min to finish processing a sequence of 200 amino acids. Since its official release in August 2011, RaptorX has processed ~6,000 sequences submitted by ~1,600 users from around the world.


Assuntos
Modelos Moleculares , Proteínas/química , Software , Biologia Computacional/métodos , Conformação Proteica , Estrutura Secundária de Proteína , Estrutura Terciária de Proteína
12.
Nat Commun ; 3: 1249, 2012.
Artigo em Inglês | MEDLINE | ID: mdl-23212378

RESUMO

Cholesterol is known to modulate the physical properties of cell membranes, but its direct involvement in cellular signaling has not been thoroughly investigated. Here we show that cholesterol specifically binds many PDZ domains found in scaffold proteins, including the N-terminal PDZ domain of NHERF1/EBP50. This modular domain has a cholesterol-binding site topologically distinct from its canonical protein-binding site and serves as a dual-specificity domain that bridges the membrane and juxta-membrane signaling complexes. Disruption of the cholesterol-binding activity of NHERF1 largely abrogates its dynamic co-localization with and activation of cystic fibrosis transmembrane conductance regulator, one of its binding partners in the plasma membrane of mammalian cells. At least seven more PDZ domains from other scaffold proteins also bind cholesterol and have cholesterol-binding sites, suggesting that cholesterol modulates cell signaling through direct interactions with these scaffold proteins. This mechanism may provide an alternative explanation for the formation of signaling platforms in cholesterol-rich membrane domains.


Assuntos
Colesterol/fisiologia , Domínios PDZ/fisiologia , Transdução de Sinais/fisiologia , Sítios de Ligação , Canais de Cloreto/fisiologia , Polarização de Fluorescência , Células HEK293/fisiologia , Humanos , Regiões de Interação com a Matriz/fisiologia , Microscopia Confocal , Imagem Molecular , Fosfoproteínas/fisiologia , Trocadores de Sódio-Hidrogênio/fisiologia
13.
Artigo em Inglês | MEDLINE | ID: mdl-19963936

RESUMO

Machine learning based classification protocols for automated function annotation of protein structures have in many instances proven superior to simpler sequence based procedures. Here we present an automated method for extracting features from protein structures by construction of surface patches to be used in such protocols. The utility of the developed patch-growing procedure is exemplified by its ability to identify reversible membrane binding domains from the C1, C2, and PH families.


Assuntos
Inteligência Artificial , Membrana Celular/metabolismo , Proteínas/química , Algoritmos , Automação , Bases de Dados de Proteínas , Ligação de Hidrogênio , Interações Hidrofóbicas e Hidrofílicas , Ligação Proteica , Estrutura Terciária de Proteína , Solventes , Eletricidade Estática
14.
Cell Biochem Biophys ; 55(3): 141-52, 2009.
Artigo em Inglês | MEDLINE | ID: mdl-19669741

RESUMO

Efficient communication between the cell and its external environment is of the utmost importance to the function of multicellular organisms. While signaling events can be generally characterized as information exchange by means of controlled energy conversion, research efforts have hitherto mainly been concerned with mechanisms involving chemical and electrical energy transfer. Here, we review recent computational efforts addressing the function of mechanical force in signal transduction. Specifically, we focus on the role of steered molecular dynamics (SMD) simulations in providing details at the atomic level on a group of protein domains, which play a fundamental role in signal exchange by responding properly to mechanical strain. We start by giving a brief introduction to the SMD technique and general properties of mechanically stable protein folds, followed by specific examples illustrating three general regimes of signal transfer utilizing mechanical energy: purely mechanical, mechanical to chemical, and chemical to mechanical. Whenever possible the physiological importance of the example at hand is stressed to highlight the diversity of the processes in which mechanical signaling plays a key role. We also provide an overview of future challenges and perspectives for this rapidly developing field.


Assuntos
Transferência de Energia , Mecanotransdução Celular , Dobramento de Proteína , Animais , Humanos , Simulação de Dinâmica Molecular
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA