Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 8 de 8
Filtrar
1.
BMC Bioinformatics ; 11: 33, 2010 Jan 15.
Artigo em Inglês | MEDLINE | ID: mdl-20078885

RESUMO

BACKGROUND: With the rapid expansion of DNA sequencing databases, it is now feasible to identify relevant information from prior sequencing projects and completed genomes and apply it to de novo sequencing of new organisms. As an example, this paper demonstrates how such extra information can be used to improve de novo assemblies by augmenting the overlapping step. Finding all pairs of overlapping reads is a key task in many genome assemblers, and to this end, highly efficient algorithms have been developed to find alignments in large collections of sequences. It is well known that due to repeated sequences, many aligned pairs of reads nevertheless do not overlap. But no overlapping algorithm to date takes a rigorous approach to separating aligned but non-overlapping read pairs from true overlaps. RESULTS: We present an approach that extends the Minimus assembler by a data driven step to classify overlaps as true or false prior to contig construction. We trained several different classification models within the Weka framework using various statistics derived from overlaps of reads available from prior sequencing projects. These statistics included percent mismatch and k-mer frequencies within the overlaps as well as a comparative genomics score derived from mapping reads to multiple reference genomes. We show that in real whole-genome sequencing data from the E. coli and S. aureus genomes, by providing a curated set of overlaps to the contigging phase of the assembler, we nearly doubled the median contig length (N50) without sacrificing coverage of the genome or increasing the number of mis-assemblies. CONCLUSIONS: Machine learning methods that use comparative and non-comparative features to classify overlaps as true or false can be used to improve the quality of a sequence assembly.


Assuntos
Inteligência Artificial , Genômica/métodos , Análise de Sequência de DNA/métodos , Sequência de Bases , Bases de Dados de Ácidos Nucleicos , Genoma Bacteriano , Genoma Fúngico
2.
Respirar (Ciudad Autón. B. Aires) ; 16(1): 85-92, Marzo 2024.
Artigo em Espanhol | LILACS, UNISALUD, BINACIS | ID: biblio-1551285

RESUMO

El sarcoma folicular de células dendríticas (SFCD) es una neoplasia maligna rara derivada de las células dendríticas foliculares. Ha sido clasificado, dadas sus características inmunohistoquímicas, como parte del grupo de los sarcomas, donde representa un porcentaje menor al 1%. Actualmente, existen menos de 1.000 reportes en la literatura a nivel mundial, lo cual plantea una dificultad no sólo diagnóstica, siendo confundido frecuentemente con neoplasias de tipo linfoide; sino también terapéutica al no existir un claro consenso sobre su manejo definitivo. Esta revisión de caso clínico describe el primer caso reportado de SFCD en Costa Rica.


Follicular dendritic cell sarcoma (SFCD) is a rare malignant neoplasm derived from follicular dendritic cells, which has been classified, given its immunohistochemical characteristics, as part of the group of sarcomas, where it represents less than 1%. Currently, there are less than 1000 reports in the literature worldwide, which generates a difficulty not only in diagnosis, being frequently confused with lymphoid type neoplasms; but also, as therapeutic as there is no clear consensus on its definitive management. This clinical case review describes the first reported case of SFCD in Costa Rica.


Assuntos
Humanos , Feminino , Adulto , Asma/diagnóstico , Tosse/diagnóstico , Sarcoma de Células Dendríticas Foliculares/diagnóstico , Neoplasias do Mediastino/diagnóstico , Obesidade/diagnóstico , Biópsia , Relatos de Casos , Diagnóstico por Imagem , Imuno-Histoquímica , Toracotomia , Costa Rica
3.
J Comput Biol ; 11(4): 734-52, 2004.
Artigo em Inglês | MEDLINE | ID: mdl-15579242

RESUMO

The whole-genome shotgun (WGS) assembly technique has been remarkably successful in efforts to determine the sequence of bases that make up a genome. WGS assembly begins with a large collection of short fragments that have been selected at random from a genome. The sequence of bases at each end of the fragment is determined, albeit imprecisely, resulting in a sequence of letters called a "read." Each letter in a read is assigned a quality value, which estimates the probability that a sequencing error occurred in determining that letter. Reads are typically cut off after about 500 letters, where sequencing errors become endemic. We report on a set of procedures that (1) corrects most of the sequencing errors, (2) changes quality values accordingly, and (3) produces a list of "overlaps," i.e., pairs of reads that plausibly come from overlapping parts of the genome. Our procedures, which we call collectively the "UMD Overlapper," can be run iteratively and as a preprocessor for other assemblers. We tested the UMD Overlapper on Celera's Drosophila reads. When we replaced Celera's overlap procedures in the front end of their assembler, it was able to produce a significantly improved genome.


Assuntos
Genoma , Genômica/estatística & dados numéricos , Animais , Biologia Computacional , DNA/genética , Bases de Dados de Ácidos Nucleicos , Drosophila/genética , Análise de Sequência de DNA/estatística & dados numéricos , Software
4.
J Comput Biol ; 17(11): 1549-60, 2010 Nov.
Artigo em Inglês | MEDLINE | ID: mdl-20973743

RESUMO

Genomic sequencing techniques introduce experimental errors into reads which can mislead sequence assembly efforts and complicate the diagnostic process. Here we present a method for detecting and removing sequencing errors from reads generated in genomic shotgun sequencing projects prior to sequence assembly. For each input read, the set of all length k substrings (k-mers) it contains are calculated. The read is evaluated based on the frequency with which each k-mer occurs in the complete data set (k-count). For each read, k-mers are clustered using the variable-bandwidth mean-shift algorithm. Based on the k-count of the cluster center, clusters are classified as error regions or non-error regions. For the 23 real and simulated data sets tested (454 and Solexa), our algorithm detected error regions that cover 99% of all errors. A heuristic algorithm is then applied to detect the location of errors in each putative error region. A read is corrected by removing the errors, thereby creating two or more smaller, error-free read fragments. After performing error removal, the error-rate for all data sets tested decreased (∼35-fold reduction, on average). EDAR has comparable accuracy to methods that correct rather than remove errors and when the error rate is greater than 3% for simulated data sets, it performs better. The performance of the Velvet assembler is generally better with error-removed data. However, for short reads, splitting at the location of errors can be problematic. Following error detection with error correction, rather than removal, may improve the assembly results.


Assuntos
Algoritmos , Biologia Computacional/métodos , Análise de Sequência de DNA/métodos , Genoma , Alinhamento de Sequência/métodos
5.
Proc Natl Acad Sci U S A ; 100(22): 12984-8, 2003 Oct 28.
Artigo em Inglês | MEDLINE | ID: mdl-14566062

RESUMO

The hyperthermophile Nanoarchaeum equitans is an obligate symbiont growing in coculture with the crenarchaeon Ignicoccus. Ribosomal protein and rRNA-based phylogenies place its branching point early in the archaeal lineage, representing the new archaeal kingdom Nanoarchaeota. The N. equitans genome (490,885 base pairs) encodes the machinery for information processing and repair, but lacks genes for lipid, cofactor, amino acid, or nucleotide biosyntheses. It is the smallest microbial genome sequenced to date, and also one of the most compact, with 95% of the DNA predicted to encode proteins or stable RNAs. Its limited biosynthetic and catabolic capacity indicates that N. equitans' symbiotic relationship to Ignicoccus is parasitic, making it the only known archaeal parasite. Unlike the small genomes of bacterial parasites that are undergoing reductive evolution, N. equitans has few pseudogenes or extensive regions of noncoding DNA. This organism represents a basal archaeal lineage and has a highly reduced genome.


Assuntos
Archaea/genética , Evolução Biológica , Genoma Arqueal , Arabidopsis/microbiologia , Archaea/classificação , Archaea/patogenicidade , DNA Arqueal/genética , Biblioteca Gênica , Filogenia
6.
Proc Natl Acad Sci U S A ; 101(7): 1916-21, 2004 Feb 17.
Artigo em Inglês | MEDLINE | ID: mdl-14769938

RESUMO

We report a whole-genome shotgun assembly (called WGSA) of the human genome generated at Celera in 2001. The Celera-generated shotgun data set consisted of 27 million sequencing reads organized in pairs by virtue of end-sequencing 2-kbp, 10-kbp, and 50-kbp inserts from shotgun clone libraries. The quality-trimmed reads covered the genome 5.3 times, and the inserts from which pairs of reads were obtained covered the genome 39 times. With the nearly complete human DNA sequence [National Center for Biotechnology Information (NCBI) Build 34] now available, it is possible to directly assess the quality, accuracy, and completeness of WGSA and of the first reconstructions of the human genome reported in two landmark papers in February 2001 [Venter, J. C., Adams, M. D., Myers, E. W., Li, P. W., Mural, R. J., Sutton, G. G., Smith, H. O., Yandell, M., Evans, C. A., Holt, R. A., et al. (2001) Science 291, 1304-1351; International Human Genome Sequencing Consortium (2001) Nature 409, 860-921]. The analysis of WGSA shows 97% order and orientation agreement with NCBI Build 34, where most of the 3% of sequence out of order is due to scaffold placement problems as opposed to assembly errors within the scaffolds themselves. In addition, WGSA fills some of the remaining gaps in NCBI Build 34. The early genome sequences all covered about the same amount of the genome, but they did so in different ways. The Celera results provide more order and orientation, and the consortium sequence provides better coverage of exact and nearly exact repeats.


Assuntos
Biologia Computacional , Genoma Humano , Projeto Genoma Humano , Biologia Computacional/normas , Mapeamento de Sequências Contíguas/normas , Humanos , RNA Mensageiro/análise , Software
7.
Science ; 296(5573): 1661-71, 2002 May 31.
Artigo em Inglês | MEDLINE | ID: mdl-12040188

RESUMO

The high degree of similarity between the mouse and human genomes is demonstrated through analysis of the sequence of mouse chromosome 16 (Mmu 16), which was obtained as part of a whole-genome shotgun assembly of the mouse genome. The mouse genome is about 10% smaller than the human genome, owing to a lower repetitive DNA content. Comparison of the structure and protein-coding potential of Mmu 16 with that of the homologous segments of the human genome identifies regions of conserved synteny with human chromosomes (Hsa) 3, 8, 12, 16, 21, and 22. Gene content and order are highly conserved between Mmu 16 and the syntenic blocks of the human genome. Of the 731 predicted genes on Mmu 16, 509 align with orthologs on the corresponding portions of the human genome, 44 are likely paralogous to these genes, and 164 genes have homologs elsewhere in the human genome; there are 14 genes for which we could find no human counterpart.


Assuntos
Cromossomos/genética , Genoma Humano , Genoma , Camundongos Endogâmicos/genética , Análise de Sequência de DNA , Sintenia , Animais , Composição de Bases , Cromossomos Humanos/genética , Biologia Computacional , Sequência Conservada , Bases de Dados de Ácidos Nucleicos , Evolução Molecular , Genes , Marcadores Genéticos , Genômica , Humanos , Camundongos , Camundongos Endogâmicos A/genética , Camundongos Endogâmicos DBA/genética , Dados de Sequência Molecular , Mapeamento Físico do Cromossomo , Proteínas/química , Proteínas/genética , Alinhamento de Sequência , Especificidade da Espécie
8.
Science ; 298(5591): 129-49, 2002 Oct 04.
Artigo em Inglês | MEDLINE | ID: mdl-12364791

RESUMO

Anopheles gambiae is the principal vector of malaria, a disease that afflicts more than 500 million people and causes more than 1 million deaths each year. Tenfold shotgun sequence coverage was obtained from the PEST strain of A. gambiae and assembled into scaffolds that span 278 million base pairs. A total of 91% of the genome was organized in 303 scaffolds; the largest scaffold was 23.1 million base pairs. There was substantial genetic variation within this strain, and the apparent existence of two haplotypes of approximately equal frequency ("dual haplotypes") in a substantial fraction of the genome likely reflects the outbred nature of the PEST strain. The sequence produced a conservative inference of more than 400,000 single-nucleotide polymorphisms that showed a markedly bimodal density distribution. Analysis of the genome sequence revealed strong evidence for about 14,000 protein-encoding transcripts. Prominent expansions in specific families of proteins likely involved in cell adhesion and immunity were noted. An expressed sequence tag analysis of genes regulated by blood feeding provided insights into the physiological adaptations of a hematophagous insect.


Assuntos
Anopheles/genética , Genes de Insetos , Genoma , Análise de Sequência de DNA , Animais , Anopheles/classificação , Anopheles/parasitologia , Anopheles/fisiologia , Evolução Biológica , Sangue , Inversão Cromossômica , Cromossomos Artificiais Bacterianos , Biologia Computacional , Elementos de DNA Transponíveis , Digestão , Drosophila melanogaster/genética , Enzimas/química , Enzimas/genética , Enzimas/metabolismo , Etiquetas de Sequências Expressas , Comportamento Alimentar , Regulação da Expressão Gênica , Variação Genética , Haplótipos , Humanos , Proteínas de Insetos/química , Proteínas de Insetos/genética , Proteínas de Insetos/fisiologia , Insetos Vetores/genética , Insetos Vetores/parasitologia , Insetos Vetores/fisiologia , Malária Falciparum/transmissão , Dados de Sequência Molecular , Controle de Mosquitos , Mapeamento Físico do Cromossomo , Plasmodium falciparum/crescimento & desenvolvimento , Polimorfismo de Nucleotídeo Único , Proteoma , Especificidade da Espécie , Fatores de Transcrição/química , Fatores de Transcrição/genética , Fatores de Transcrição/fisiologia
SELEÇÃO DE REFERÊNCIAS
Detalhe da pesquisa