Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 24
Filtrar
Más filtros

Bases de datos
Tipo del documento
Intervalo de año de publicación
1.
Nature ; 475(7356): 348-52, 2011 Jul 20.
Artículo en Inglés | MEDLINE | ID: mdl-21776081

RESUMEN

The seminal importance of DNA sequencing to the life sciences, biotechnology and medicine has driven the search for more scalable and lower-cost solutions. Here we describe a DNA sequencing technology in which scalable, low-cost semiconductor manufacturing techniques are used to make an integrated circuit able to directly perform non-optical DNA sequencing of genomes. Sequence data are obtained by directly sensing the ions produced by template-directed DNA polymerase synthesis using all-natural nucleotides on this massively parallel semiconductor-sensing device or ion chip. The ion chip contains ion-sensitive, field-effect transistor-based sensors in perfect register with 1.2 million wells, which provide confinement and allow parallel, simultaneous detection of independent sequencing reactions. Use of the most widely used technology for constructing integrated circuits, the complementary metal-oxide semiconductor (CMOS) process, allows for low-cost, large-scale production and scaling of the device to higher densities and larger array sizes. We show the performance of the system by sequencing three bacterial genomes, its robustness and scalability by producing ion chips with up to 10 times as many sensors and sequencing a human genome.


Asunto(s)
Genoma Bacteriano/genética , Genoma Humano/genética , Genómica/instrumentación , Genómica/métodos , Semiconductores , Análisis de Secuencia de ADN/instrumentación , Análisis de Secuencia de ADN/métodos , Escherichia coli/genética , Humanos , Luz , Masculino , Rhodopseudomonas/genética , Vibrio/genética
2.
Genomics ; 98(2): 79-89, 2011 Aug.
Artículo en Inglés | MEDLINE | ID: mdl-21565264

RESUMEN

The success of genome-wide association studies has paralleled the development of efficient genotyping technologies. We describe the development of a next-generation microarray based on the new highly-efficient Affymetrix Axiom genotyping technology that we are using to genotype individuals of European ancestry from the Kaiser Permanente Research Program on Genes, Environment and Health (RPGEH). The array contains 674,517 SNPs, and provides excellent genome-wide as well as gene-based and candidate-SNP coverage. Coverage was calculated using an approach based on imputation and cross validation. Preliminary results for the first 80,301 saliva-derived DNA samples from the RPGEH demonstrate very high quality genotypes, with sample success rates above 94% and over 98% of successful samples having SNP call rates exceeding 98%. At steady state, we have produced 462 million genotypes per week for each Axiom system. The new array provides a valuable addition to the repertoire of tools for large scale genome-wide association studies.


Asunto(s)
Estudio de Asociación del Genoma Completo/métodos , Ensayos Analíticos de Alto Rendimiento , Análisis de Secuencia por Matrices de Oligonucleótidos/métodos , Polimorfismo de Nucleótido Simple/genética , Población Blanca/genética , Humanos
3.
Genome Biol ; 22(1): 109, 2021 04 16.
Artículo en Inglés | MEDLINE | ID: mdl-33863344

RESUMEN

BACKGROUND: Targeted sequencing using oncopanels requires comprehensive assessments of accuracy and detection sensitivity to ensure analytical validity. By employing reference materials characterized by the U.S. Food and Drug Administration-led SEquence Quality Control project phase2 (SEQC2) effort, we perform a cross-platform multi-lab evaluation of eight Pan-Cancer panels to assess best practices for oncopanel sequencing. RESULTS: All panels demonstrate high sensitivity across targeted high-confidence coding regions and variant types for the variants previously verified to have variant allele frequency (VAF) in the 5-20% range. Sensitivity is reduced by utilizing VAF thresholds due to inherent variability in VAF measurements. Enforcing a VAF threshold for reporting has a positive impact on reducing false positive calls. Importantly, the false positive rate is found to be significantly higher outside the high-confidence coding regions, resulting in lower reproducibility. Thus, region restriction and VAF thresholds lead to low relative technical variability in estimating promising biomarkers and tumor mutational burden. CONCLUSION: This comprehensive study provides actionable guidelines for oncopanel sequencing and clear evidence that supports a simplified approach to assess the analytical performance of oncopanels. It will facilitate the rapid implementation, validation, and quality control of oncopanels in clinical use.


Asunto(s)
Biomarcadores de Tumor , Pruebas Genéticas/métodos , Genómica/métodos , Neoplasias/genética , Oncogenes , Variaciones en el Número de Copia de ADN , Pruebas Genéticas/normas , Genómica/normas , Humanos , Técnicas de Diagnóstico Molecular/métodos , Técnicas de Diagnóstico Molecular/normas , Mutación , Neoplasias/diagnóstico , Polimorfismo de Nucleótido Simple , Reproducibilidad de los Resultados , Sensibilidad y Especificidad
4.
Genome Biol ; 22(1): 111, 2021 04 16.
Artículo en Inglés | MEDLINE | ID: mdl-33863366

RESUMEN

BACKGROUND: Oncopanel genomic testing, which identifies important somatic variants, is increasingly common in medical practice and especially in clinical trials. Currently, there is a paucity of reliable genomic reference samples having a suitably large number of pre-identified variants for properly assessing oncopanel assay analytical quality and performance. The FDA-led Sequencing and Quality Control Phase 2 (SEQC2) consortium analyze ten diverse cancer cell lines individually and their pool, termed Sample A, to develop a reference sample with suitably large numbers of coding positions with known (variant) positives and negatives for properly evaluating oncopanel analytical performance. RESULTS: In reference Sample A, we identify more than 40,000 variants down to 1% allele frequency with more than 25,000 variants having less than 20% allele frequency with 1653 variants in COSMIC-related genes. This is 5-100× more than existing commercially available samples. We also identify an unprecedented number of negative positions in coding regions, allowing statistical rigor in assessing limit-of-detection, sensitivity, and precision. Over 300 loci are randomly selected and independently verified via droplet digital PCR with 100% concordance. Agilent normal reference Sample B can be admixed with Sample A to create new samples with a similar number of known variants at much lower allele frequency than what exists in Sample A natively, including known variants having allele frequency of 0.02%, a range suitable for assessing liquid biopsy panels. CONCLUSION: These new reference samples and their admixtures provide superior capability for performing oncopanel quality control, analytical accuracy, and validation for small to large oncopanels and liquid biopsy assays.


Asunto(s)
Alelos , Biomarcadores de Tumor , Frecuencia de los Genes , Pruebas Genéticas/métodos , Variación Genética , Genómica/métodos , Neoplasias/genética , Línea Celular Tumoral , Variaciones en el Número de Copia de ADN , Heterogeneidad Genética , Pruebas Genéticas/normas , Genómica/normas , Humanos , Neoplasias/diagnóstico , Flujo de Trabajo
5.
Nat Biotechnol ; 39(9): 1115-1128, 2021 09.
Artículo en Inglés | MEDLINE | ID: mdl-33846644

RESUMEN

Circulating tumor DNA (ctDNA) sequencing is being rapidly adopted in precision oncology, but the accuracy, sensitivity and reproducibility of ctDNA assays is poorly understood. Here we report the findings of a multi-site, cross-platform evaluation of the analytical performance of five industry-leading ctDNA assays. We evaluated each stage of the ctDNA sequencing workflow with simulations, synthetic DNA spike-in experiments and proficiency testing on standardized, cell-line-derived reference samples. Above 0.5% variant allele frequency, ctDNA mutations were detected with high sensitivity, precision and reproducibility by all five assays, whereas, below this limit, detection became unreliable and varied widely between assays, especially when input material was limited. Missed mutations (false negatives) were more common than erroneous candidates (false positives), indicating that the reliable sampling of rare ctDNA fragments is the key challenge for ctDNA assays. This comprehensive evaluation of the analytical performance of ctDNA assays serves to inform best practice guidelines and provides a resource for precision oncology.


Asunto(s)
ADN Tumoral Circulante/genética , Oncología Médica , Neoplasias/genética , Medicina de Precisión , Análisis de Secuencia de ADN/normas , Secuenciación de Nucleótidos de Alto Rendimiento/métodos , Humanos , Límite de Detección , Guías de Práctica Clínica como Asunto , Reproducibilidad de los Resultados
6.
J Comput Biol ; 13(3): 579-613, 2006 Apr.
Artículo en Inglés | MEDLINE | ID: mdl-16706714

RESUMEN

Cawley et al. (2004) have recently mapped the locations of binding sites for three transcription factors along human chromosomes 21 and 22 using ChIP-Chip experiments. ChIP-Chip experiments are a new approach to the genomewide identification of transcription factor binding sites and consist of chromatin (Ch) immunoprecipitation (IP) of transcription factor-bound genomic DNA followed by high density oligonucleotide hybridization (Chip) of the IP-enriched DNA. We investigate the ChIP-Chip data structure and propose methods for inferring the location of transcription factor binding sites from these data. The proposed methods involve testing for each probe whether it is part of a bound sequence using a scan statistic that takes into account the spatial structure of the data. Different multiple testing procedures are considered for controlling the familywise error rate and false discovery rate. A nested-Bonferroni adjustment, which is more powerful than the traditional Bonferroni adjustment when the test statistics are dependent, is discussed. Simulation studies show that taking into account the spatial structure of the data substantially improves the sensitivity of the multiple testing procedures. Application of the proposed methods to ChIP-Chip data for transcription factor p53 identified many potential target binding regions along human chromosomes 21 and 22. Among these identified regions, 18% fall within a 3 kb vicinity of the 5'UTR of a known gene or CpG island and 31% fall between the codon start site and the codon end site of a known gene but not inside an exon. More than half of these potential target sequences contain the p53 consensus binding site or very close matches to it. Moreover, these target segments include the 13 experimentally verified p53 binding regions of Cawley et al. (2004), as well as 49 additional regions that show higher hybridization signal than these 13 experimentally verified regions.


Asunto(s)
Inmunoprecipitación de Cromatina , Mapeo Cromosómico , Genoma Humano/genética , Modelos Genéticos , Análisis de Secuencia por Matrices de Oligonucleótidos , Proteína p53 Supresora de Tumor/genética , Regiones no Traducidas 5'/genética , Regiones no Traducidas 5'/metabolismo , Sitios de Unión/genética , Cromosomas Humanos Par 21/genética , Cromosomas Humanos Par 21/metabolismo , Cromosomas Humanos Par 22/genética , Cromosomas Humanos Par 22/metabolismo , Islas de CpG/genética , Perfilación de la Expresión Génica , Humanos , Unión Proteica , Reproducibilidad de los Resultados , Sensibilidad y Especificidad , Proteína p53 Supresora de Tumor/metabolismo
7.
Nucleic Acids Res ; 31(13): 3507-9, 2003 Jul 01.
Artículo en Inglés | MEDLINE | ID: mdl-12824355

RESUMEN

SLAM is a program that simultaneously aligns and annotates pairs of homologous sequences. The SLAM web server integrates SLAM with repeat masking tools and the AVID alignment program to allow for rapid alignment and gene prediction in user submitted sequences. Along with annotations and alignments for the submitted sequences, users obtain a list of predicted conserved non-coding sequences (and their associated alignments). The web site also links to whole genome annotations of the human, mouse and rat genomes produced with the SLAM program. The server can be accessed at http://bio.math.berkeley.edu/slam.


Asunto(s)
Genómica/métodos , Alineación de Secuencia/métodos , Análisis de Secuencia de ADN/métodos , Programas Informáticos , Algoritmos , Secuencia de Aminoácidos , Animales , Secuencia de Bases , Secuencia Conservada , Componentes del Gen , Humanos , Internet , Cadenas de Markov , Ratones , Péptidos/química , ARN Mensajero/química , ARN no Traducido/química , Ratas
8.
Bioinformatics ; 19 Suppl 2: ii36-41, 2003 Oct.
Artículo en Inglés | MEDLINE | ID: mdl-14534169

RESUMEN

The standard method of applying hidden Markov models to biological problems is to find a Viterbi (maximal weight) path through the HMM graph. The Viterbi algorithm reduces the problem of finding the most likely hidden state sequence that explains given observations, to a dynamic programming problem for corresponding directed acyclic graphs. For example, in the gene finding application, the HMM is used to find the most likely underlying gene structure given a DNA sequence. In this note we discuss the applications of sampling methods for HMMs. The standard sampling algorithm for HMMs is a variant of the common forward-backward and backtrack algorithms, and has already been applied in the context of Gibbs sampling methods. Nevetheless, the practice of sampling state paths from HMMs does not seem to have been widely adopted, and important applications have been overlooked. We show how sampling can be used for finding alternative splicings for genes, including alternative splicings that are conserved between genes from related organisms. We also show how sampling from the posterior distribution is a natural way to compute probabilities for predicted exons and gene structures being correct under the assumed model. Finally, we describe a new memory efficient sampling algorithm for certain classes of HMMs which provides a practical sampling alternative to the Hirschberg algorithm for optimal alignment. The ideas presented have applications not only to gene finding and HMMs but more generally to stochastic context free grammars and RNA structure prediction.


Asunto(s)
Algoritmos , Empalme Alternativo/genética , Reconocimiento de Normas Patrones Automatizadas/métodos , Sitios de Empalme de ARN/genética , Alineación de Secuencia/métodos , Análisis de Secuencia de ADN/métodos , Inteligencia Artificial , Secuencia de Bases , Secuencia Conservada , Cadenas de Markov , Datos de Secuencia Molecular , Homología de Secuencia de Ácido Nucleico
9.
BMC Genet ; 6 Suppl 1: S17, 2005 Dec 30.
Artículo en Inglés | MEDLINE | ID: mdl-16451625

RESUMEN

The Collaborative Study on the Genetics of Alcoholism (COGA) is a large-scale family study designed to identify genes that affect the risk for alcoholism and alcohol-related phenotypes. We performed genome-wide linkage analyses on the COGA data made available to participants in the Genetic Analysis Workshop 14 (GAW 14). The dataset comprised 1,350 participants from 143 families. The samples were analyzed on three technologies: microsatellites spaced at 10 cM, Affymetrix GeneChip Human Mapping 10 K Array (HMA10K) and Illumina SNP-based Linkage III Panel. We used ALDX1 and ALDX2, the COGA definitions of alcohol dependence, as well as electrophysiological measures TTTH1 and ECB21 to detect alcoholism susceptibility loci. Many chromosomal regions were found to be significant for each of the phenotypes at a p-value of 0.05. The most significant region for ALDX1 is on chromosome 7, with a maximum LOD score of 2.25 for Affymetrix SNPs, 1.97 for Illumina SNPs, and 1.72 for microsatellites. The same regions on chromosome 7 (96-106 cM) and 10 (149-176 cM) were found to be significant for both ALDX1 and ALDX2. A region on chromosome 7 (112-153 cM) and a region on chromosome 6 (169-185 cM) were identified as the most significant regions for TTTH1 and ECB21, respectively. We also performed linkage analysis on denser maps of markers by combining the SNPs datasets from Affymetrix and Illumina. Adding the microsatellite data to the combined SNP dataset improved the results only marginally. The results indicated that SNPs outperform microsatellites with the densest marker sets performing the best.


Asunto(s)
Alcoholismo/genética , Alcoholismo/fisiopatología , Mapeo Cromosómico , Electroencefalografía , Estudio de Asociación del Genoma Completo , Repeticiones de Microsatélite/genética , Polimorfismo de Nucleótido Simple/genética , Cromosomas Humanos Par 7/genética , Humanos , Fenotipo
10.
BMC Genet ; 6 Suppl 1: S2, 2005 Dec 30.
Artículo en Inglés | MEDLINE | ID: mdl-16451628

RESUMEN

The data provided to the Genetic Analysis Workshop 14 (GAW 14) was the result of a collaboration among several different groups, catalyzed by Elizabeth Pugh from The Center for Inherited Disease Research (CIDR) and the organizers of GAW 14, Jean MacCluer and Laura Almasy. The DNA, phenotypic characterization, and microsatellite genomic survey were provided by the Collaborative Study on the Genetics of Alcoholism (COGA), a nine-site national collaboration funded by the National Institute of Alcohol and Alcoholism (NIAAA) and the National Institute of Drug Abuse (NIDA) with the overarching goal of identifying and characterizing genes that affect the susceptibility to develop alcohol dependence and related phenotypes. CIDR, Affymetrix, and Illumina provided single-nucleotide polymorphism genotyping of a large subset of the COGA subjects. This article briefly describes the dataset that was provided.


Asunto(s)
Alcoholismo/genética , Congresos como Asunto , Conducta Cooperativa , Bases de Datos Genéticas , Polimorfismo de Nucleótido Simple/genética , Genotipo , Humanos , Análisis de Secuencia por Matrices de Oligonucleótidos , Control de Calidad
11.
J Comput Biol ; 9(2): 389-99, 2002.
Artículo en Inglés | MEDLINE | ID: mdl-12015888

RESUMEN

Hidden Markov models (HMMs) have been successfully applied to a variety of problems in molecular biology, ranging from alignment problems to gene finding and annotation. Alignment problems can be solved with pair HMMs, while gene finding programs rely on generalized HMMs in order to model exon lengths. In this paper, we introduce the generalized pair HMM (GPHMM), which is an extension of both pair and generalized HMMs. We show how GPHMMs, in conjunction with approximate alignments, can be used for cross-species gene finding and describe applications to DNA-cDNA and DNA-protein alignment. GPHMMs provide a unifying and probabilistically sound theory for modeling these problems.


Asunto(s)
Cadenas de Markov , Alineación de Secuencia/estadística & datos numéricos , Algoritmos , Biología Computacional , ADN/genética , Modelos Estadísticos , Proteínas/genética
12.
Nat Genet ; 40(10): 1253-60, 2008 Oct.
Artículo en Inglés | MEDLINE | ID: mdl-18776909

RESUMEN

Accurate and complete measurement of single nucleotide (SNP) and copy number (CNV) variants, both common and rare, will be required to understand the role of genetic variation in disease. We present Birdsuite, a four-stage analytical framework instantiated in software for deriving integrated and mutually consistent copy number and SNP genotypes. The method sequentially assigns copy number across regions of common copy number polymorphisms (CNPs), calls genotypes of SNPs, identifies rare CNVs via a hidden Markov model (HMM), and generates an integrated sequence and copy number genotype at every locus (for example, including genotypes such as A-null, AAB and BBB in addition to AA, AB and BB calls). Such genotypes more accurately depict the underlying sequence of each individual, reducing the rate of apparent mendelian inconsistencies. The Birdsuite software is applied here to data from the Affymetrix SNP 6.0 array. Additionally, we describe a method, implemented in PLINK, to utilize these combined SNP and CNV genotypes for association testing with a phenotype.


Asunto(s)
Cromosomas Humanos Par 4/genética , Cromosomas Humanos/genética , ADN/genética , Dosificación de Gen/genética , Haplotipos/genética , Modelos Estadísticos , Polimorfismo de Nucleótido Simple , Algoritmos , Femenino , Genoma Humano , Genotipo , Humanos , Masculino , Cadenas de Markov , Análisis de Secuencia por Matrices de Oligonucleótidos , Reacción en Cadena de la Polimerasa , Programas Informáticos
13.
Nat Genet ; 40(10): 1166-74, 2008 Oct.
Artículo en Inglés | MEDLINE | ID: mdl-18776908

RESUMEN

Dissecting the genetic basis of disease risk requires measuring all forms of genetic variation, including SNPs and copy number variants (CNVs), and is enabled by accurate maps of their locations, frequencies and population-genetic properties. We designed a hybrid genotyping array (Affymetrix SNP 6.0) to simultaneously measure 906,600 SNPs and copy number at 1.8 million genomic locations. By characterizing 270 HapMap samples, we developed a map of human CNV (at 2-kb breakpoint resolution) informed by integer genotypes for 1,320 copy number polymorphisms (CNPs) that segregate at an allele frequency >1%. More than 80% of the sequence in previously reported CNV regions fell outside our estimated CNV boundaries, indicating that large (>100 kb) CNVs affect much less of the genome than initially reported. Approximately 80% of observed copy number differences between pairs of individuals were due to common CNPs with an allele frequency >5%, and more than 99% derived from inheritance rather than new mutation. Most common, diallelic CNPs were in strong linkage disequilibrium with SNPs, and most low-frequency CNVs segregated on specific SNP haplotypes.


Asunto(s)
Cromosomas Humanos/genética , ADN/genética , Dosificación de Gen/genética , Haplotipos/genética , Polimorfismo de Nucleótido Simple , Grupos de Población/genética , Variación Genética , Genoma Humano , Humanos , Análisis de Secuencia por Matrices de Oligonucleótidos , Reacción en Cadena de la Polimerasa
14.
Hum Hered ; 63(3-4): 219-28, 2007.
Artículo en Inglés | MEDLINE | ID: mdl-17347569

RESUMEN

BACKGROUND: Current biotechnologies are able to achieve high accuracy and call rates. Concerns are raised on how differential performance on various genotypes may bias association tests. Quantitatively, we define differential dropout rate as the ratio of no-call rate among heterozygotes and homozygotes. METHODS: The hazard ofdifferential dropout is examined for population- and family-based association tests through a simulation study. Also, we investigate detection approaches such as Hardy-Weinberg Equilibrium (HWE) and testing for correlation between sample call rate and sample heterozygosity. Finally, we analyze two public datasets and evaluate the magnitudes of differential dropout. RESULTS: In case-control settings, differential dropout has negligible effect on power and odds ratio (OR) estimation. However, the impact on family-based tests range from minor to severe depending on the disease parameters. Such impact is more prominent when disease allele frequency is relatively low (e.g., 5%), where a differential dropout rate of 2.5 can dramatically bias OR estimation and reduce power even at a decent 98% overall call rate and moderate effect size (e.g., OR(true) = 2.11). Both of the two public datasets follow HWE; however, HapMap data carries detectable differential dropout that may endanger family-based studies. CONCLUSIONS: Case-control approach appears to be robust to differential dropout; however, family-based association tests can be heavily biased. Both of the public genotype data show high call rate, but differential dropout is detected in HapMap data. We suggest researchers carefully control this potential confounder even using data of high accuracy and high overall call rate.


Asunto(s)
Genotipo , Polimorfismo de Nucleótido Simple , Estudios de Casos y Controles , Frecuencia de los Genes , Humanos , Modelos Lineales , Oportunidad Relativa , Tamaño de la Muestra
15.
Conf Proc IEEE Eng Med Biol Soc ; 2005: 2809-12, 2005.
Artículo en Inglés | MEDLINE | ID: mdl-17282826

RESUMEN

Analysis of high density oligonucleotide arrays for resequencing requires methods which are highly robust and accurate. We introduce an alternative base calling method built upon ABACUS with the particular advantage of achieving a very low rate for false positive detection of heterozygotes.

16.
Bioinformatics ; 21 Suppl 1: i107-15, 2005 Jun.
Artículo en Inglés | MEDLINE | ID: mdl-15961447

RESUMEN

MOTIVATION: Many or most mammalian genes undergo alternative splicing, generating a variety of transcripts from a single gene. New information on splice variation is becoming available through technology for measuring expression levels of several exons or splice junctions per gene. We have developed a statistical method, ANalysis Of Splice VAriation (ANOSVA) to detect alternative splicing from expression data. Since ANOSVA requires no transcript information, it can be applied when the level of annotation is poor. When validated against spiked clone data, it generated no false positives and few false negatives. We demonstrated ANOSVA with data from a prototype mouse alternative splicing array, run against normal adult tissues, yielding a set of genes with evidence of tissue-specific splice variation. AVAILABILITY: The results are available at the supplementary information site. SUPPLEMENTARY INFORMATION: The results are available at the supplementary information site https://bioinfo.affymetrix.com/Papers/ANOSVA/


Asunto(s)
Empalme Alternativo , Biología Computacional/métodos , Perfilación de la Expresión Génica , Animales , Bases de Datos de Proteínas , Reacciones Falso Positivas , Ratones , Modelos Estadísticos , Análisis de Secuencia por Matrices de Oligonucleótidos , Reproducibilidad de los Resultados , Programas Informáticos
17.
Bioinformatics ; 21(9): 1958-63, 2005 May 01.
Artículo en Inglés | MEDLINE | ID: mdl-15657097

RESUMEN

MOTIVATION: A high density of single nucleotide polymorphism (SNP) coverage on the genome is desirable and often an essential requirement for population genetics studies. Region-specific or chromosome-specific linkage studies also benefit from the availability of as many high quality SNPs as possible. The availability of millions of SNPs from both Perlegen and the public domain and the development of an efficient microarray-based assay for genotyping SNPs has brought up some interesting analytical challenges. Effective methods for the selection of optimal subsets of SNPs spanning the genome and methods for accurately calling genotypes from probe hybridization patterns have enabled the development of a new microarray-based system for robustly genotyping over 100,000 SNPs per sample. RESULTS: We introduce a new dynamic model-based algorithm (DM) for screening over 3 million SNPs and genotyping over 100,000 SNPs. The model is based on four possible underlying states: Null, A, AB and B for each probe quartet. We calculate a probe-level log likelihood for each model and then select between the four competing models with an SNP-level statistical aggregation across multiple probe quartets to provide a high-quality genotype call along with a quality measure of the call. We assess performance with HapMap reference genotypes, informative Mendelian inheritance relationship in families, and consistency between DM and another genotype classification method. At a call rate of 95.91% the concordance with reference genotypes from the HapMap Project is 99.81% based on over 1.5 million genotypes, the Mendelian error rate is 0.018% based on 10 trios, and the consistency between DM and MPAM is 99.90% at a comparable rate of 97.18%. We also develop methods for SNP selection and optimal probe selection. AVAILABILITY: The DM algorithm is available in Affymetrix's Genotyping Tools software package and in Affymetrix's GDAS software package. See http://www.affymetrix.com for further information. 10 K and 100 K mapping array data are available on the Affymetrix website.


Asunto(s)
Algoritmos , Análisis Mutacional de ADN/métodos , Pruebas Genéticas/métodos , Modelos Genéticos , Análisis de Secuencia por Matrices de Oligonucleótidos/métodos , Polimorfismo de Nucleótido Simple/genética , Alineación de Secuencia/métodos , Análisis de Secuencia de ADN/métodos , Simulación por Computador , Genotipo , Humanos , Programas Informáticos
18.
Genome Res ; 13(3): 496-502, 2003 Mar.
Artículo en Inglés | MEDLINE | ID: mdl-12618381

RESUMEN

Comparative-based gene recognition is driven by the principle that conserved regions between related organisms are more likely than divergent regions to be coding. We describe a probabilistic framework for gene structure and alignment that can be used to simultaneously find both the gene structure and alignment of two syntenic genomic regions. A key feature of the method is the ability to enhance gene predictions by finding the best alignment between two syntenic sequences, while at the same time finding biologically meaningful alignments that preserve the correspondence between coding exons. Our probabilistic framework is the generalized pair hidden Markov model, a hybrid of (1). generalized hidden Markov models, which have been used previously for gene finding, and (2). pair hidden Markov models, which have applications to sequence alignment. We have built a gene finding and alignment program called SLAM, which aligns and identifies complete exon/intron structures of genes in two related but unannotated sequences of DNA. SLAM is able to reliably predict gene structures for any suitably related pair of organisms, most notably with fewer false-positive predictions compared to previous methods (examples are provided for Homo sapiens/Mus musculus and Plasmodium falciparum/Plasmodium vivax comparisons). Accuracy is obtained by distinguishing conserved noncoding sequence (CNS) from conserved coding sequence. CNS annotation is a novel feature of SLAM and may be useful for the annotation of UTRs, regulatory elements, and other noncoding features.


Asunto(s)
Genes/genética , Cadenas de Markov , Alineación de Secuencia/estadística & datos numéricos , Programas Informáticos , Animales , Biología Computacional/métodos , Biología Computacional/estadística & datos numéricos , Secuencia Conservada/genética , ADN/genética , ADN Protozoario/genética , Genes Protozoarios/genética , Humanos , Ratones , Plasmodium falciparum/genética , Plasmodium vivax/genética , Alineación de Secuencia/métodos , Diseño de Software , Especificidad de la Especie
19.
Genome Res ; 14(4): 661-4, 2004 Apr.
Artículo en Inglés | MEDLINE | ID: mdl-15060007

RESUMEN

We describe a new method for simultaneously identifying novel homologous genes with identical structure in the human, mouse, and rat genomes by combining pairwise predictions made with the SLAM gene-finding program. Using this method, we found 3698 gene triples in the human, mouse, and rat genomes which are predicted with exactly the same gene structure. We show, both computationally and experimentally, that the introns of these triples are predicted accurately as compared with the introns of other ab initio gene prediction sets. Computationally, we compared the introns of these gene triples, as well as those from other ab initio gene finders, with known intron annotations. We show that a unique property of SLAM, namely that it predicts gene structures simultaneously in two organisms, is key to producing sets of predictions that are highly accurate in intron structure when combined with other programs. Experimentally, we performed reverse transcription-polymerase chain reaction (RT-PCR) in both the human and rat to test the exon pairs flanking introns from a subset of the gene triples for which the human gene had not been previously identified. By performing RT-PCR on orthologous introns in both the human and rat genomes, we additionally explore the validity of using RT-PCR as a method for confirming gene predictions.


Asunto(s)
Genes/genética , Animales , Mapeo Cromosómico/métodos , Biología Computacional/métodos , Bases de Datos Genéticas , Exones/genética , Genoma , Genoma Humano , Humanos , Intrones/genética , Ratones , Valor Predictivo de las Pruebas , Ratas , Homología de Secuencia de Ácido Nucleico , Programas Informáticos
20.
Science ; 296(5569): 916-9, 2002 May 03.
Artículo en Inglés | MEDLINE | ID: mdl-11988577

RESUMEN

The sequences of the human chromosomes 21 and 22 indicate that there are approximately 770 well-characterized and predicted genes. In this study, empirically derived maps identifying active areas of RNA transcription on these chromosomes have been constructed with the use of cytosolic polyadenylated RNA obtained from 11 human cell lines. Oligonucleotide arrays containing probes spaced on average every 35 base pairs along these chromosomes were used. When compared with the sequence annotations available for these chromosomes, it is noted that as much as an order of magnitude more of the genomic sequence is transcribed than accounted for by the predicted and characterized exons.


Asunto(s)
Cromosomas Humanos Par 21/genética , Cromosomas Humanos Par 22/genética , Mapeo Físico de Cromosoma , ARN Mensajero/genética , Transcripción Genética , Línea Celular , Núcleo Celular/metabolismo , Biología Computacional , Mapeo Contig , Citosol/metabolismo , ADN Complementario , Síndrome de DiGeorge/genética , Exones , Humanos , Análisis de Secuencia por Matrices de Oligonucleótidos , Sondas de Oligonucleótidos , Reacción en Cadena de la Polimerasa , ARN Mensajero/metabolismo , Reacción en Cadena de la Polimerasa de Transcriptasa Inversa , Células Tumorales Cultivadas
SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA