Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 24
Filtrar
1.
Genome Biol ; 22(1): 109, 2021 04 16.
Artigo em Inglês | MEDLINE | ID: mdl-33863344

RESUMO

BACKGROUND: Targeted sequencing using oncopanels requires comprehensive assessments of accuracy and detection sensitivity to ensure analytical validity. By employing reference materials characterized by the U.S. Food and Drug Administration-led SEquence Quality Control project phase2 (SEQC2) effort, we perform a cross-platform multi-lab evaluation of eight Pan-Cancer panels to assess best practices for oncopanel sequencing. RESULTS: All panels demonstrate high sensitivity across targeted high-confidence coding regions and variant types for the variants previously verified to have variant allele frequency (VAF) in the 5-20% range. Sensitivity is reduced by utilizing VAF thresholds due to inherent variability in VAF measurements. Enforcing a VAF threshold for reporting has a positive impact on reducing false positive calls. Importantly, the false positive rate is found to be significantly higher outside the high-confidence coding regions, resulting in lower reproducibility. Thus, region restriction and VAF thresholds lead to low relative technical variability in estimating promising biomarkers and tumor mutational burden. CONCLUSION: This comprehensive study provides actionable guidelines for oncopanel sequencing and clear evidence that supports a simplified approach to assess the analytical performance of oncopanels. It will facilitate the rapid implementation, validation, and quality control of oncopanels in clinical use.


Assuntos
Biomarcadores Tumorais , Testes Genéticos/métodos , Genômica/métodos , Neoplasias/genética , Oncogenes , Variações do Número de Cópias de DNA , Testes Genéticos/normas , Genômica/normas , Humanos , Técnicas de Diagnóstico Molecular/métodos , Técnicas de Diagnóstico Molecular/normas , Mutação , Neoplasias/diagnóstico , Polimorfismo de Nucleotídeo Único , Reprodutibilidade dos Testes , Sensibilidade e Especificidade
2.
Genome Biol ; 22(1): 111, 2021 04 16.
Artigo em Inglês | MEDLINE | ID: mdl-33863366

RESUMO

BACKGROUND: Oncopanel genomic testing, which identifies important somatic variants, is increasingly common in medical practice and especially in clinical trials. Currently, there is a paucity of reliable genomic reference samples having a suitably large number of pre-identified variants for properly assessing oncopanel assay analytical quality and performance. The FDA-led Sequencing and Quality Control Phase 2 (SEQC2) consortium analyze ten diverse cancer cell lines individually and their pool, termed Sample A, to develop a reference sample with suitably large numbers of coding positions with known (variant) positives and negatives for properly evaluating oncopanel analytical performance. RESULTS: In reference Sample A, we identify more than 40,000 variants down to 1% allele frequency with more than 25,000 variants having less than 20% allele frequency with 1653 variants in COSMIC-related genes. This is 5-100× more than existing commercially available samples. We also identify an unprecedented number of negative positions in coding regions, allowing statistical rigor in assessing limit-of-detection, sensitivity, and precision. Over 300 loci are randomly selected and independently verified via droplet digital PCR with 100% concordance. Agilent normal reference Sample B can be admixed with Sample A to create new samples with a similar number of known variants at much lower allele frequency than what exists in Sample A natively, including known variants having allele frequency of 0.02%, a range suitable for assessing liquid biopsy panels. CONCLUSION: These new reference samples and their admixtures provide superior capability for performing oncopanel quality control, analytical accuracy, and validation for small to large oncopanels and liquid biopsy assays.


Assuntos
Alelos , Biomarcadores Tumorais , Frequência do Gene , Testes Genéticos/métodos , Variação Genética , Genômica/métodos , Neoplasias/genética , Linhagem Celular Tumoral , Variações do Número de Cópias de DNA , Heterogeneidade Genética , Testes Genéticos/normas , Genômica/normas , Humanos , Neoplasias/diagnóstico , Fluxo de Trabalho
3.
Nat Biotechnol ; 39(9): 1115-1128, 2021 09.
Artigo em Inglês | MEDLINE | ID: mdl-33846644

RESUMO

Circulating tumor DNA (ctDNA) sequencing is being rapidly adopted in precision oncology, but the accuracy, sensitivity and reproducibility of ctDNA assays is poorly understood. Here we report the findings of a multi-site, cross-platform evaluation of the analytical performance of five industry-leading ctDNA assays. We evaluated each stage of the ctDNA sequencing workflow with simulations, synthetic DNA spike-in experiments and proficiency testing on standardized, cell-line-derived reference samples. Above 0.5% variant allele frequency, ctDNA mutations were detected with high sensitivity, precision and reproducibility by all five assays, whereas, below this limit, detection became unreliable and varied widely between assays, especially when input material was limited. Missed mutations (false negatives) were more common than erroneous candidates (false positives), indicating that the reliable sampling of rare ctDNA fragments is the key challenge for ctDNA assays. This comprehensive evaluation of the analytical performance of ctDNA assays serves to inform best practice guidelines and provides a resource for precision oncology.


Assuntos
DNA Tumoral Circulante/genética , Oncologia , Neoplasias/genética , Medicina de Precisão , Análise de Sequência de DNA/normas , Sequenciamento de Nucleotídeos em Larga Escala/métodos , Humanos , Limite de Detecção , Guias de Prática Clínica como Assunto , Reprodutibilidade dos Testes
4.
Nature ; 475(7356): 348-52, 2011 Jul 20.
Artigo em Inglês | MEDLINE | ID: mdl-21776081

RESUMO

The seminal importance of DNA sequencing to the life sciences, biotechnology and medicine has driven the search for more scalable and lower-cost solutions. Here we describe a DNA sequencing technology in which scalable, low-cost semiconductor manufacturing techniques are used to make an integrated circuit able to directly perform non-optical DNA sequencing of genomes. Sequence data are obtained by directly sensing the ions produced by template-directed DNA polymerase synthesis using all-natural nucleotides on this massively parallel semiconductor-sensing device or ion chip. The ion chip contains ion-sensitive, field-effect transistor-based sensors in perfect register with 1.2 million wells, which provide confinement and allow parallel, simultaneous detection of independent sequencing reactions. Use of the most widely used technology for constructing integrated circuits, the complementary metal-oxide semiconductor (CMOS) process, allows for low-cost, large-scale production and scaling of the device to higher densities and larger array sizes. We show the performance of the system by sequencing three bacterial genomes, its robustness and scalability by producing ion chips with up to 10 times as many sensors and sequencing a human genome.


Assuntos
Genoma Bacteriano/genética , Genoma Humano/genética , Genômica/instrumentação , Genômica/métodos , Semicondutores , Análise de Sequência de DNA/instrumentação , Análise de Sequência de DNA/métodos , Escherichia coli/genética , Humanos , Luz , Masculino , Rodopseudomonas/genética , Vibrio/genética
5.
Genomics ; 98(2): 79-89, 2011 Aug.
Artigo em Inglês | MEDLINE | ID: mdl-21565264

RESUMO

The success of genome-wide association studies has paralleled the development of efficient genotyping technologies. We describe the development of a next-generation microarray based on the new highly-efficient Affymetrix Axiom genotyping technology that we are using to genotype individuals of European ancestry from the Kaiser Permanente Research Program on Genes, Environment and Health (RPGEH). The array contains 674,517 SNPs, and provides excellent genome-wide as well as gene-based and candidate-SNP coverage. Coverage was calculated using an approach based on imputation and cross validation. Preliminary results for the first 80,301 saliva-derived DNA samples from the RPGEH demonstrate very high quality genotypes, with sample success rates above 94% and over 98% of successful samples having SNP call rates exceeding 98%. At steady state, we have produced 462 million genotypes per week for each Axiom system. The new array provides a valuable addition to the repertoire of tools for large scale genome-wide association studies.


Assuntos
Estudo de Associação Genômica Ampla/métodos , Ensaios de Triagem em Larga Escala , Análise de Sequência com Séries de Oligonucleotídeos/métodos , Polimorfismo de Nucleotídeo Único/genética , População Branca/genética , Humanos
6.
Nat Genet ; 40(10): 1166-74, 2008 Oct.
Artigo em Inglês | MEDLINE | ID: mdl-18776908

RESUMO

Dissecting the genetic basis of disease risk requires measuring all forms of genetic variation, including SNPs and copy number variants (CNVs), and is enabled by accurate maps of their locations, frequencies and population-genetic properties. We designed a hybrid genotyping array (Affymetrix SNP 6.0) to simultaneously measure 906,600 SNPs and copy number at 1.8 million genomic locations. By characterizing 270 HapMap samples, we developed a map of human CNV (at 2-kb breakpoint resolution) informed by integer genotypes for 1,320 copy number polymorphisms (CNPs) that segregate at an allele frequency >1%. More than 80% of the sequence in previously reported CNV regions fell outside our estimated CNV boundaries, indicating that large (>100 kb) CNVs affect much less of the genome than initially reported. Approximately 80% of observed copy number differences between pairs of individuals were due to common CNPs with an allele frequency >5%, and more than 99% derived from inheritance rather than new mutation. Most common, diallelic CNPs were in strong linkage disequilibrium with SNPs, and most low-frequency CNVs segregated on specific SNP haplotypes.


Assuntos
Cromossomos Humanos/genética , DNA/genética , Dosagem de Genes/genética , Haplótipos/genética , Polimorfismo de Nucleotídeo Único , Grupos Populacionais/genética , Variação Genética , Genoma Humano , Humanos , Análise de Sequência com Séries de Oligonucleotídeos , Reação em Cadeia da Polimerase
7.
Nat Genet ; 40(10): 1253-60, 2008 Oct.
Artigo em Inglês | MEDLINE | ID: mdl-18776909

RESUMO

Accurate and complete measurement of single nucleotide (SNP) and copy number (CNV) variants, both common and rare, will be required to understand the role of genetic variation in disease. We present Birdsuite, a four-stage analytical framework instantiated in software for deriving integrated and mutually consistent copy number and SNP genotypes. The method sequentially assigns copy number across regions of common copy number polymorphisms (CNPs), calls genotypes of SNPs, identifies rare CNVs via a hidden Markov model (HMM), and generates an integrated sequence and copy number genotype at every locus (for example, including genotypes such as A-null, AAB and BBB in addition to AA, AB and BB calls). Such genotypes more accurately depict the underlying sequence of each individual, reducing the rate of apparent mendelian inconsistencies. The Birdsuite software is applied here to data from the Affymetrix SNP 6.0 array. Additionally, we describe a method, implemented in PLINK, to utilize these combined SNP and CNV genotypes for association testing with a phenotype.


Assuntos
Cromossomos Humanos Par 4/genética , Cromossomos Humanos/genética , DNA/genética , Dosagem de Genes/genética , Haplótipos/genética , Modelos Estatísticos , Polimorfismo de Nucleotídeo Único , Algoritmos , Feminino , Genoma Humano , Genótipo , Humanos , Masculino , Cadeias de Markov , Análise de Sequência com Séries de Oligonucleotídeos , Reação em Cadeia da Polimerase , Software
8.
Hum Hered ; 63(3-4): 219-28, 2007.
Artigo em Inglês | MEDLINE | ID: mdl-17347569

RESUMO

BACKGROUND: Current biotechnologies are able to achieve high accuracy and call rates. Concerns are raised on how differential performance on various genotypes may bias association tests. Quantitatively, we define differential dropout rate as the ratio of no-call rate among heterozygotes and homozygotes. METHODS: The hazard ofdifferential dropout is examined for population- and family-based association tests through a simulation study. Also, we investigate detection approaches such as Hardy-Weinberg Equilibrium (HWE) and testing for correlation between sample call rate and sample heterozygosity. Finally, we analyze two public datasets and evaluate the magnitudes of differential dropout. RESULTS: In case-control settings, differential dropout has negligible effect on power and odds ratio (OR) estimation. However, the impact on family-based tests range from minor to severe depending on the disease parameters. Such impact is more prominent when disease allele frequency is relatively low (e.g., 5%), where a differential dropout rate of 2.5 can dramatically bias OR estimation and reduce power even at a decent 98% overall call rate and moderate effect size (e.g., OR(true) = 2.11). Both of the two public datasets follow HWE; however, HapMap data carries detectable differential dropout that may endanger family-based studies. CONCLUSIONS: Case-control approach appears to be robust to differential dropout; however, family-based association tests can be heavily biased. Both of the public genotype data show high call rate, but differential dropout is detected in HapMap data. We suggest researchers carefully control this potential confounder even using data of high accuracy and high overall call rate.


Assuntos
Genótipo , Polimorfismo de Nucleotídeo Único , Estudos de Casos e Controles , Frequência do Gene , Humanos , Modelos Lineares , Razão de Chances , Tamanho da Amostra
9.
J Comput Biol ; 13(3): 579-613, 2006 Apr.
Artigo em Inglês | MEDLINE | ID: mdl-16706714

RESUMO

Cawley et al. (2004) have recently mapped the locations of binding sites for three transcription factors along human chromosomes 21 and 22 using ChIP-Chip experiments. ChIP-Chip experiments are a new approach to the genomewide identification of transcription factor binding sites and consist of chromatin (Ch) immunoprecipitation (IP) of transcription factor-bound genomic DNA followed by high density oligonucleotide hybridization (Chip) of the IP-enriched DNA. We investigate the ChIP-Chip data structure and propose methods for inferring the location of transcription factor binding sites from these data. The proposed methods involve testing for each probe whether it is part of a bound sequence using a scan statistic that takes into account the spatial structure of the data. Different multiple testing procedures are considered for controlling the familywise error rate and false discovery rate. A nested-Bonferroni adjustment, which is more powerful than the traditional Bonferroni adjustment when the test statistics are dependent, is discussed. Simulation studies show that taking into account the spatial structure of the data substantially improves the sensitivity of the multiple testing procedures. Application of the proposed methods to ChIP-Chip data for transcription factor p53 identified many potential target binding regions along human chromosomes 21 and 22. Among these identified regions, 18% fall within a 3 kb vicinity of the 5'UTR of a known gene or CpG island and 31% fall between the codon start site and the codon end site of a known gene but not inside an exon. More than half of these potential target sequences contain the p53 consensus binding site or very close matches to it. Moreover, these target segments include the 13 experimentally verified p53 binding regions of Cawley et al. (2004), as well as 49 additional regions that show higher hybridization signal than these 13 experimentally verified regions.


Assuntos
Imunoprecipitação da Cromatina , Mapeamento Cromossômico , Genoma Humano/genética , Modelos Genéticos , Análise de Sequência com Séries de Oligonucleotídeos , Proteína Supressora de Tumor p53/genética , Regiões 5' não Traduzidas/genética , Regiões 5' não Traduzidas/metabolismo , Sítios de Ligação/genética , Cromossomos Humanos Par 21/genética , Cromossomos Humanos Par 21/metabolismo , Cromossomos Humanos Par 22/genética , Cromossomos Humanos Par 22/metabolismo , Ilhas de CpG/genética , Perfilação da Expressão Gênica , Humanos , Ligação Proteica , Reprodutibilidade dos Testes , Sensibilidade e Especificidade , Proteína Supressora de Tumor p53/metabolismo
10.
Bioinformatics ; 21 Suppl 1: i107-15, 2005 Jun.
Artigo em Inglês | MEDLINE | ID: mdl-15961447

RESUMO

MOTIVATION: Many or most mammalian genes undergo alternative splicing, generating a variety of transcripts from a single gene. New information on splice variation is becoming available through technology for measuring expression levels of several exons or splice junctions per gene. We have developed a statistical method, ANalysis Of Splice VAriation (ANOSVA) to detect alternative splicing from expression data. Since ANOSVA requires no transcript information, it can be applied when the level of annotation is poor. When validated against spiked clone data, it generated no false positives and few false negatives. We demonstrated ANOSVA with data from a prototype mouse alternative splicing array, run against normal adult tissues, yielding a set of genes with evidence of tissue-specific splice variation. AVAILABILITY: The results are available at the supplementary information site. SUPPLEMENTARY INFORMATION: The results are available at the supplementary information site https://bioinfo.affymetrix.com/Papers/ANOSVA/


Assuntos
Processamento Alternativo , Biologia Computacional/métodos , Perfilação da Expressão Gênica , Animais , Bases de Dados de Proteínas , Reações Falso-Positivas , Camundongos , Modelos Estatísticos , Análise de Sequência com Séries de Oligonucleotídeos , Reprodutibilidade dos Testes , Software
11.
Bioinformatics ; 21(9): 1958-63, 2005 May 01.
Artigo em Inglês | MEDLINE | ID: mdl-15657097

RESUMO

MOTIVATION: A high density of single nucleotide polymorphism (SNP) coverage on the genome is desirable and often an essential requirement for population genetics studies. Region-specific or chromosome-specific linkage studies also benefit from the availability of as many high quality SNPs as possible. The availability of millions of SNPs from both Perlegen and the public domain and the development of an efficient microarray-based assay for genotyping SNPs has brought up some interesting analytical challenges. Effective methods for the selection of optimal subsets of SNPs spanning the genome and methods for accurately calling genotypes from probe hybridization patterns have enabled the development of a new microarray-based system for robustly genotyping over 100,000 SNPs per sample. RESULTS: We introduce a new dynamic model-based algorithm (DM) for screening over 3 million SNPs and genotyping over 100,000 SNPs. The model is based on four possible underlying states: Null, A, AB and B for each probe quartet. We calculate a probe-level log likelihood for each model and then select between the four competing models with an SNP-level statistical aggregation across multiple probe quartets to provide a high-quality genotype call along with a quality measure of the call. We assess performance with HapMap reference genotypes, informative Mendelian inheritance relationship in families, and consistency between DM and another genotype classification method. At a call rate of 95.91% the concordance with reference genotypes from the HapMap Project is 99.81% based on over 1.5 million genotypes, the Mendelian error rate is 0.018% based on 10 trios, and the consistency between DM and MPAM is 99.90% at a comparable rate of 97.18%. We also develop methods for SNP selection and optimal probe selection. AVAILABILITY: The DM algorithm is available in Affymetrix's Genotyping Tools software package and in Affymetrix's GDAS software package. See http://www.affymetrix.com for further information. 10 K and 100 K mapping array data are available on the Affymetrix website.


Assuntos
Algoritmos , Análise Mutacional de DNA/métodos , Testes Genéticos/métodos , Modelos Genéticos , Análise de Sequência com Séries de Oligonucleotídeos/métodos , Polimorfismo de Nucleotídeo Único/genética , Alinhamento de Sequência/métodos , Análise de Sequência de DNA/métodos , Simulação por Computador , Genótipo , Humanos , Software
12.
Conf Proc IEEE Eng Med Biol Soc ; 2005: 2809-12, 2005.
Artigo em Inglês | MEDLINE | ID: mdl-17282826

RESUMO

Analysis of high density oligonucleotide arrays for resequencing requires methods which are highly robust and accurate. We introduce an alternative base calling method built upon ABACUS with the particular advantage of achieving a very low rate for false positive detection of heterozygotes.

13.
BMC Genet ; 6 Suppl 1: S17, 2005 Dec 30.
Artigo em Inglês | MEDLINE | ID: mdl-16451625

RESUMO

The Collaborative Study on the Genetics of Alcoholism (COGA) is a large-scale family study designed to identify genes that affect the risk for alcoholism and alcohol-related phenotypes. We performed genome-wide linkage analyses on the COGA data made available to participants in the Genetic Analysis Workshop 14 (GAW 14). The dataset comprised 1,350 participants from 143 families. The samples were analyzed on three technologies: microsatellites spaced at 10 cM, Affymetrix GeneChip Human Mapping 10 K Array (HMA10K) and Illumina SNP-based Linkage III Panel. We used ALDX1 and ALDX2, the COGA definitions of alcohol dependence, as well as electrophysiological measures TTTH1 and ECB21 to detect alcoholism susceptibility loci. Many chromosomal regions were found to be significant for each of the phenotypes at a p-value of 0.05. The most significant region for ALDX1 is on chromosome 7, with a maximum LOD score of 2.25 for Affymetrix SNPs, 1.97 for Illumina SNPs, and 1.72 for microsatellites. The same regions on chromosome 7 (96-106 cM) and 10 (149-176 cM) were found to be significant for both ALDX1 and ALDX2. A region on chromosome 7 (112-153 cM) and a region on chromosome 6 (169-185 cM) were identified as the most significant regions for TTTH1 and ECB21, respectively. We also performed linkage analysis on denser maps of markers by combining the SNPs datasets from Affymetrix and Illumina. Adding the microsatellite data to the combined SNP dataset improved the results only marginally. The results indicated that SNPs outperform microsatellites with the densest marker sets performing the best.


Assuntos
Alcoolismo/genética , Alcoolismo/fisiopatologia , Mapeamento Cromossômico , Eletroencefalografia , Estudo de Associação Genômica Ampla , Repetições de Microssatélites/genética , Polimorfismo de Nucleotídeo Único/genética , Cromossomos Humanos Par 7/genética , Humanos , Fenótipo
14.
BMC Genet ; 6 Suppl 1: S2, 2005 Dec 30.
Artigo em Inglês | MEDLINE | ID: mdl-16451628

RESUMO

The data provided to the Genetic Analysis Workshop 14 (GAW 14) was the result of a collaboration among several different groups, catalyzed by Elizabeth Pugh from The Center for Inherited Disease Research (CIDR) and the organizers of GAW 14, Jean MacCluer and Laura Almasy. The DNA, phenotypic characterization, and microsatellite genomic survey were provided by the Collaborative Study on the Genetics of Alcoholism (COGA), a nine-site national collaboration funded by the National Institute of Alcohol and Alcoholism (NIAAA) and the National Institute of Drug Abuse (NIDA) with the overarching goal of identifying and characterizing genes that affect the susceptibility to develop alcohol dependence and related phenotypes. CIDR, Affymetrix, and Illumina provided single-nucleotide polymorphism genotyping of a large subset of the COGA subjects. This article briefly describes the dataset that was provided.


Assuntos
Alcoolismo/genética , Congressos como Assunto , Comportamento Cooperativo , Bases de Dados Genéticas , Polimorfismo de Nucleotídeo Único/genética , Genótipo , Humanos , Análise de Sequência com Séries de Oligonucleotídeos , Controle de Qualidade
15.
Genome Res ; 14(4): 661-4, 2004 Apr.
Artigo em Inglês | MEDLINE | ID: mdl-15060007

RESUMO

We describe a new method for simultaneously identifying novel homologous genes with identical structure in the human, mouse, and rat genomes by combining pairwise predictions made with the SLAM gene-finding program. Using this method, we found 3698 gene triples in the human, mouse, and rat genomes which are predicted with exactly the same gene structure. We show, both computationally and experimentally, that the introns of these triples are predicted accurately as compared with the introns of other ab initio gene prediction sets. Computationally, we compared the introns of these gene triples, as well as those from other ab initio gene finders, with known intron annotations. We show that a unique property of SLAM, namely that it predicts gene structures simultaneously in two organisms, is key to producing sets of predictions that are highly accurate in intron structure when combined with other programs. Experimentally, we performed reverse transcription-polymerase chain reaction (RT-PCR) in both the human and rat to test the exon pairs flanking introns from a subset of the gene triples for which the human gene had not been previously identified. By performing RT-PCR on orthologous introns in both the human and rat genomes, we additionally explore the validity of using RT-PCR as a method for confirming gene predictions.


Assuntos
Genes/genética , Animais , Mapeamento Cromossômico/métodos , Biologia Computacional/métodos , Bases de Dados Genéticas , Éxons/genética , Genoma , Genoma Humano , Humanos , Íntrons/genética , Camundongos , Valor Preditivo dos Testes , Ratos , Homologia de Sequência do Ácido Nucleico , Software
16.
Genome Res ; 14(3): 331-42, 2004 Mar.
Artigo em Inglês | MEDLINE | ID: mdl-14993201

RESUMO

In this report, we have achieved a richer view of the transcriptome for Chromosomes 21 and 22 by using high-density oligonucleotide arrays on cytosolic poly(A)(+) RNA. Conservatively, only 31.4% of the observed transcribed nucleotides correspond to well-annotated genes, whereas an additional 4.8% and 14.7% correspond to mRNAs and ESTs, respectively. Approximately 85% of the known exons were detected, and up to 21% of known genes have only a single isoform based on exon-skipping alternative expression. Overall, the expression of the well-characterized exons falls predominately into two categories, uniquely or ubiquitously expressed with an identifiable proportion of antisense transcripts. The remaining observed transcription (49.0%) was outside of any known annotation. These novel transcripts appear to be more cell-line-specific and have lower and less variation in expression than the well-characterized genes. Novel transcripts were further characterized based on their distance to annotations, transcript size, coding capacity, and identification as antisense to intronic sequences. By RT-PCR, 126 novel transcripts were independently verified, resulting in a 65% verification rate. These observations strongly support the argument for a re-evaluation of the total number of human genes and an alternative term for "gene" to encompass these growing, novel classes of RNA transcripts in the human genome.


Assuntos
Cromossomos Humanos Par 21/genética , Cromossomos Humanos Par 22/genética , RNA/genética , Transcrição Gênica/genética , Linhagem Celular , Linhagem Celular Tumoral , Mapeamento Cromossômico/métodos , DNA de Neoplasias/genética , Perfilação da Expressão Gênica/métodos , Genes/genética , Genes Neoplásicos/genética , Humanos , Células Jurkat/química , Células Jurkat/metabolismo , Dados de Sequência Molecular , Análise de Sequência com Séries de Oligonucleotídeos/métodos , Sondas de Oligonucleotídeos/genética , RNA Mensageiro/genética
17.
Cell ; 116(4): 499-509, 2004 Feb 20.
Artigo em Inglês | MEDLINE | ID: mdl-14980218

RESUMO

Using high-density oligonucleotide arrays representing essentially all nonrepetitive sequences on human chromosomes 21 and 22, we map the binding sites in vivo for three DNA binding transcription factors, Sp1, cMyc, and p53, in an unbiased manner. This mapping reveals an unexpectedly large number of transcription factor binding site (TFBS) regions, with a minimal estimate of 12,000 for Sp1, 25,000 for cMyc, and 1600 for p53 when extrapolated to the full genome. Only 22% of these TFBS regions are located at the 5' termini of protein-coding genes while 36% lie within or immediately 3' to well-characterized genes and are significantly correlated with noncoding RNAs. A significant number of these noncoding RNAs are regulated in response to retinoic acid, and overlapping pairs of protein-coding and noncoding RNAs are often coregulated. Thus, the human genome contains roughly comparable numbers of protein-coding and noncoding genes that are bound by common transcription factors and regulated by common environmental signals.


Assuntos
Cromossomos Humanos Par 21 , Cromossomos Humanos Par 22 , Fatores de Transcrição/metabolismo , Motivos de Aminoácidos , Sítios de Ligação , Linhagem Celular , Cromatina/metabolismo , Mapeamento Cromossômico , Ilhas de CpG , Éxons , Etiquetas de Sequências Expressas , Genoma Humano , Humanos , Células Jurkat , Modelos Genéticos , Reação em Cadeia da Polimerase , Testes de Precipitina , Regiões Promotoras Genéticas , Ligação Proteica , RNA/química , RNA/metabolismo , RNA Mensageiro/metabolismo , Reação em Cadeia da Polimerase Via Transcriptase Reversa , Tretinoína/metabolismo
18.
Nat Methods ; 1(2): 109-11, 2004 Nov.
Artigo em Inglês | MEDLINE | ID: mdl-15782172

RESUMO

We present a genotyping method for simultaneously scoring 116,204 SNPs using oligonucleotide arrays. At call rates >99%, reproducibility is >99.97% and accuracy, as measured by inheritance in trios and concordance with the HapMap Project, is >99.7%. Average intermarker distance is 23.6 kb, and 92% of the genome is within 100 kb of a SNP marker. Average heterozygosity is 0.30, with 105,511 SNPs having minor allele frequencies >5%.


Assuntos
Algoritmos , Mapeamento Cromossômico/métodos , Análise Mutacional de DNA/métodos , Análise de Sequência com Séries de Oligonucleotídeos/métodos , Polimorfismo de Nucleotídeo Único/genética , Alinhamento de Sequência/métodos , Análise de Sequência de DNA/métodos , Testes Genéticos/métodos , Genoma Humano , Genótipo , Humanos , Análise de Sequência com Séries de Oligonucleotídeos/instrumentação , Reprodutibilidade dos Testes , Sensibilidade e Especificidade , Homologia de Sequência do Ácido Nucleico
19.
Bioinformatics ; 19 Suppl 2: ii36-41, 2003 Oct.
Artigo em Inglês | MEDLINE | ID: mdl-14534169

RESUMO

The standard method of applying hidden Markov models to biological problems is to find a Viterbi (maximal weight) path through the HMM graph. The Viterbi algorithm reduces the problem of finding the most likely hidden state sequence that explains given observations, to a dynamic programming problem for corresponding directed acyclic graphs. For example, in the gene finding application, the HMM is used to find the most likely underlying gene structure given a DNA sequence. In this note we discuss the applications of sampling methods for HMMs. The standard sampling algorithm for HMMs is a variant of the common forward-backward and backtrack algorithms, and has already been applied in the context of Gibbs sampling methods. Nevetheless, the practice of sampling state paths from HMMs does not seem to have been widely adopted, and important applications have been overlooked. We show how sampling can be used for finding alternative splicings for genes, including alternative splicings that are conserved between genes from related organisms. We also show how sampling from the posterior distribution is a natural way to compute probabilities for predicted exons and gene structures being correct under the assumed model. Finally, we describe a new memory efficient sampling algorithm for certain classes of HMMs which provides a practical sampling alternative to the Hirschberg algorithm for optimal alignment. The ideas presented have applications not only to gene finding and HMMs but more generally to stochastic context free grammars and RNA structure prediction.


Assuntos
Algoritmos , Processamento Alternativo/genética , Reconhecimento Automatizado de Padrão/métodos , Sítios de Splice de RNA/genética , Alinhamento de Sequência/métodos , Análise de Sequência de DNA/métodos , Inteligência Artificial , Sequência de Bases , Sequência Conservada , Cadeias de Markov , Dados de Sequência Molecular , Homologia de Sequência do Ácido Nucleico
20.
Nucleic Acids Res ; 31(13): 3507-9, 2003 Jul 01.
Artigo em Inglês | MEDLINE | ID: mdl-12824355

RESUMO

SLAM is a program that simultaneously aligns and annotates pairs of homologous sequences. The SLAM web server integrates SLAM with repeat masking tools and the AVID alignment program to allow for rapid alignment and gene prediction in user submitted sequences. Along with annotations and alignments for the submitted sequences, users obtain a list of predicted conserved non-coding sequences (and their associated alignments). The web site also links to whole genome annotations of the human, mouse and rat genomes produced with the SLAM program. The server can be accessed at http://bio.math.berkeley.edu/slam.


Assuntos
Genômica/métodos , Alinhamento de Sequência/métodos , Análise de Sequência de DNA/métodos , Software , Algoritmos , Sequência de Aminoácidos , Animais , Sequência de Bases , Sequência Conservada , Componentes do Gene , Humanos , Internet , Cadeias de Markov , Camundongos , Peptídeos/química , RNA Mensageiro/química , RNA não Traduzido/química , Ratos
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA