Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 3 de 3
Filtrar
1.
Nucleic Acids Res ; 39(Web Server issue): W557-61, 2011 Jul.
Artigo em Inglês | MEDLINE | ID: mdl-21576217

RESUMO

The massively parallel sequencing technologies have recently flourished and dramatically cut the cost to sequence personal human genomes. Haplotype assembly from personal genomes sequenced using the massively parallel sequencing technologies is becoming a cost-effective and promising tool for human disease study. Computational assembly of haplotypes has been proved to be very accurate, but obviously contains errors. Here we present a tool, HapEdit, to assess the accuracy of assembled haplotypes and edit them manually. Using this tool, a user can break erroneous haplotype segments into smaller segments, or concatenate haplotype segments if the concatenated haplotype segments are sufficiently supported. A user can also edit bases with low-quality scores. HapEdit displays haplotype assemblies so that a user can easily navigate and pinpoint a region of interest. As inputs, HapEdit currently takes reads from the Polonator, Illumina, SOLiD, 454 and Sanger sequencing technologies.


Assuntos
Haplótipos , Sequenciamento de Nucleotídeos em Larga Escala , Análise de Sequência de DNA , Software , Internet
2.
Artigo em Inglês | MEDLINE | ID: mdl-17277416

RESUMO

If the origins of fragments are known in genome sequencing projects, it is straightforward to reconstruct diploid consensus sequences. In reality, however, this is not true. Although there are proposed methods to reconstruct haplotypes from genome sequencing projects, an accuracy assessment is required to evaluate the confidence of the estimated diploid consensus sequences. In this paper, we define the confidence score of diploid consensus sequences. It requires the calculation of the likelihood of an assembly. To calculate the likelihood, we propose a linear time algorithm with respect to the number of polymorphic sites. The likelihood calculation and confidence score are used for further improvements of haplotype estimation in two directions. One direction is that low-scored phases are disconnected. The other direction is that, instead of using nominal frequency 1/2, the haplotype frequency is estimated to reflect the actual contribution of each haplotype. Our method was evaluated on the simulated data whose polymorphism rate (1.2 percent) was based on Ciona intestinalis. As a result, the high accuracy of our algorithm was indicated: The true positive rate of the haplotype estimation was greater than 97 percent.


Assuntos
Sequência Consenso/genética , Diploide , Modelos Estatísticos , Análise de Sequência de DNA/estatística & dados numéricos , Algoritmos , Animais , Ciona intestinalis/genética , Biologia Computacional/métodos , Simulação por Computador , Frequência do Gene , Haplótipos , Funções Verossimilhança , Cadeias de Markov , Polimorfismo Genético , Probabilidade , Análise de Sequência de DNA/métodos
3.
BMC Bioinformatics ; 7: 303, 2006 Jun 15.
Artigo em Inglês | MEDLINE | ID: mdl-16776821

RESUMO

BACKGROUND: Single Nucleotide Polymorphisms (SNPs) are the most common type of polymorphisms found in the human genome. Effective genetic association studies require the identification of sets of tag SNPs that capture as much haplotype information as possible. Tag SNP selection is analogous to the problem of data compression in information theory. According to Shannon's framework, the optimal tag set maximizes the entropy of the tag SNPs subject to constraints on the number of SNPs. This approach requires an appropriate probabilistic model. Compared to simple measures of Linkage Disequilibrium (LD), a good model of haplotype sequences can more accurately account for LD structure. It also provides a machinery for the prediction of tagged SNPs and thereby to assess the performances of tag sets through their ability to predict larger SNP sets. RESULTS: Here, we compute the description code-lengths of SNP data for an array of models and we develop tag SNP selection methods based on these models and the strategy of entropy maximization. Using data sets from the HapMap and ENCODE projects, we show that the hidden Markov model introduced by Li and Stephens outperforms the other models in several aspects: description code-length of SNP data, information content of tag sets, and prediction of tagged SNPs. This is the first use of this model in the context of tag SNP selection. CONCLUSION: Our study provides strong evidence that the tag sets selected by our best method, based on Li and Stephens model, outperform those chosen by several existing methods. The results also suggest that information content evaluated with a good model is more sensitive for assessing the quality of a tagging set than the correct prediction rate of tagged SNPs. Besides, we show that haplotype phase uncertainty has an almost negligible impact on the ability of good tag sets to predict tagged SNPs. This justifies the selection of tag SNPs on the basis of haplotype informativeness, although genotyping studies do not directly assess haplotypes. A software that implements our approach is available.


Assuntos
Haplótipos/genética , Desequilíbrio de Ligação , Modelos Genéticos , Polimorfismo de Nucleotídeo Único/genética , Bases de Dados Genéticas , Marcadores Genéticos , Humanos , Cadeias de Markov , Reprodutibilidade dos Testes , Análise de Sequência de DNA/métodos , Software
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA