Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 29
Filtrar
1.
Nature ; 475(7356): 348-52, 2011 Jul 20.
Artigo em Inglês | MEDLINE | ID: mdl-21776081

RESUMO

The seminal importance of DNA sequencing to the life sciences, biotechnology and medicine has driven the search for more scalable and lower-cost solutions. Here we describe a DNA sequencing technology in which scalable, low-cost semiconductor manufacturing techniques are used to make an integrated circuit able to directly perform non-optical DNA sequencing of genomes. Sequence data are obtained by directly sensing the ions produced by template-directed DNA polymerase synthesis using all-natural nucleotides on this massively parallel semiconductor-sensing device or ion chip. The ion chip contains ion-sensitive, field-effect transistor-based sensors in perfect register with 1.2 million wells, which provide confinement and allow parallel, simultaneous detection of independent sequencing reactions. Use of the most widely used technology for constructing integrated circuits, the complementary metal-oxide semiconductor (CMOS) process, allows for low-cost, large-scale production and scaling of the device to higher densities and larger array sizes. We show the performance of the system by sequencing three bacterial genomes, its robustness and scalability by producing ion chips with up to 10 times as many sensors and sequencing a human genome.


Assuntos
Genoma Bacteriano/genética , Genoma Humano/genética , Genômica/instrumentação , Genômica/métodos , Semicondutores , Análise de Sequência de DNA/instrumentação , Análise de Sequência de DNA/métodos , Escherichia coli/genética , Humanos , Luz , Masculino , Rodopseudomonas/genética , Vibrio/genética
2.
Nature ; 470(7332): 59-65, 2011 Feb 03.
Artigo em Inglês | MEDLINE | ID: mdl-21293372

RESUMO

Genomic structural variants (SVs) are abundant in humans, differing from other forms of variation in extent, origin and functional impact. Despite progress in SV characterization, the nucleotide resolution architecture of most SVs remains unknown. We constructed a map of unbalanced SVs (that is, copy number variants) based on whole genome DNA sequencing data from 185 human genomes, integrating evidence from complementary SV discovery approaches with extensive experimental validations. Our map encompassed 22,025 deletions and 6,000 additional SVs, including insertions and tandem duplications. Most SVs (53%) were mapped to nucleotide resolution, which facilitated analysing their origin and functional impact. We examined numerous whole and partial gene deletions with a genotyping approach and observed a depletion of gene disruptions amongst high frequency deletions. Furthermore, we observed differences in the size spectra of SVs originating from distinct formation mechanisms, and constructed a map of SV hotspots formed by common mechanisms. Our analytical framework and SV map serves as a resource for sequencing-based association studies.


Assuntos
Variações do Número de Cópias de DNA/genética , Genética Populacional , Genoma Humano/genética , Genômica , Duplicação Gênica/genética , Predisposição Genética para Doença/genética , Genótipo , Humanos , Mutagênese Insercional/genética , Reprodutibilidade dos Testes , Análise de Sequência de DNA , Deleção de Sequência/genética
3.
Nat Genet ; 39(3): 311-8, 2007 Mar.
Artigo em Inglês | MEDLINE | ID: mdl-17277777

RESUMO

Eukaryotic gene transcription is accompanied by acetylation and methylation of nucleosomes near promoters, but the locations and roles of histone modifications elsewhere in the genome remain unclear. We determined the chromatin modification states in high resolution along 30 Mb of the human genome and found that active promoters are marked by trimethylation of Lys4 of histone H3 (H3K4), whereas enhancers are marked by monomethylation, but not trimethylation, of H3K4. We developed computational algorithms using these distinct chromatin signatures to identify new regulatory elements, predicting over 200 promoters and 400 enhancers within the 30-Mb region. This approach accurately predicted the location and function of independently identified regulatory elements with high sensitivity and specificity and uncovered a novel functional enhancer for the carnitine transporter SLC22A5 (OCTN2). Our results give insight into the connections between chromatin modifications and transcriptional regulatory activity and provide a new tool for the functional annotation of the human genome.


Assuntos
Algoritmos , Cromatina/metabolismo , Elementos Facilitadores Genéticos , Genoma Humano , Regiões Promotoras Genéticas , Genômica , Histonas/metabolismo , Humanos , Modelos Genéticos , Proteínas de Transporte de Cátions Orgânicos/genética , Proteínas de Transporte de Cátions Orgânicos/metabolismo , Membro 5 da Família 22 de Carreadores de Soluto
4.
Sci Total Environ ; 950: 175266, 2024 Nov 10.
Artigo em Inglês | MEDLINE | ID: mdl-39102959

RESUMO

Coastal heavy-metal contamination poses significant risks to marine ecosystems and human health, necessitating comprehensive research for effective mitigation strategies. This study assessed heavy-metal pollution in sediments, seawater, and organisms in the Pearl River Estuary (PRE), with a focus on Cd, Cu, Pb, Zn, As, Hg, and Cr. A notable reduction in heavy metal concentrations in surface sediments was observed in 2020 compared to 2017 and 2018, likely due to improved pollution management and COVID-19 pandemic restrictions. Spatial analysis revealed a positive correlation between elevated heavy-metal concentrations (Cu, Pb, Zn, Cd, and As) and areas with significant human activity. Source analysis indicated that anthropogenic activities accounted for 63 % of the heavy metals in sediments, originating from industrial effluents, metal processing, vehicular activities, and fossil fuel combustion. Cd presented a high ecological risk due to its significant enrichment in surface sediments. Organisms in the PRE were found to be relatively enriched with Hg and Cu, with average As concentrations slightly exceeding the Chinese food-health criterion. This study identified high-risk ecological zones and highlighted Cd as the primary pollutant in the PRE. The findings demonstrate the effectiveness of recent pollution control measures and emphasize the need for ongoing monitoring and mitigation to safeguard marine ecosystems and human health.


Assuntos
Monitoramento Ambiental , Estuários , Sedimentos Geológicos , Metais Pesados , Água do Mar , Poluentes Químicos da Água , Metais Pesados/análise , Poluentes Químicos da Água/análise , Sedimentos Geológicos/química , China , Monitoramento Ambiental/métodos , Água do Mar/química , Organismos Aquáticos , Animais , Rios/química
5.
Genome Res ; 20(7): 972-80, 2010 Jul.
Artigo em Inglês | MEDLINE | ID: mdl-20488932

RESUMO

Abnormalities of genomic methylation patterns are lethal or cause disease, but the cues that normally designate CpG dinucleotides for methylation are poorly understood. We have developed a new method of methylation profiling that has single-CpG resolution and can address the methylation status of repeated sequences. We have used this method to determine the methylation status of >275 million CpG sites in human and mouse DNA from breast and brain tissues. Methylation density at most sequences was found to increase linearly with CpG density and to fall sharply at very high CpG densities, but transposons remained densely methylated even at higher CpG densities. The presence of histone H2A.Z and histone H3 di- or trimethylated at lysine 4 correlated strongly with unmethylated DNA and occurred primarily at promoter regions. We conclude that methylation is the default state of most CpG dinucleotides in the mammalian genome and that a combination of local dinucleotide frequencies, the interaction of repeated sequences, and the presence or absence of histone variants or modifications shields a population of CpG sites (most of which are in and around promoters) from DNA methyltransferases that lack intrinsic sequence specificity.


Assuntos
Sequência de Bases/fisiologia , Cromatina/química , Cromatina/fisiologia , Metilação de DNA , Animais , Encéfalo/metabolismo , Mama/metabolismo , Cromatina/genética , Mapeamento Cromossômico , Ilhas de CpG/genética , Feminino , Genoma , Histonas/metabolismo , Humanos , Camundongos , Análise de Sequência de DNA , Estudos de Validação como Assunto
6.
Materials (Basel) ; 16(7)2023 Mar 23.
Artigo em Inglês | MEDLINE | ID: mdl-37048864

RESUMO

Nonlinear unloading plays an important role in predicting springback during plastic forming process. To improve the accuracy of springback prediction which could provide a guide for precision forming, uniaxial tensile tests and uniaxial loading-unloading-loading tensile tests on SUS304 stainless steel were carried out. The flow stress mathematical model and chord modulus mathematical model were calibrated according to the test results. A constant elastic modulus three-point bending finite element model (E0FEMB) and a constant elastic modulus roll forming finite element model (E0FEMR) were established in MSC.MARC. The chord modulus was output by the PLOTV subroutine to determine the mean modulus of different regions, and the mean modulus three-point bending finite element model (E¯cFEMB) and the mean modulus roll forming finite element model (E¯cFEMR) were defined. The constant modulus finite element model (E0FEM) simulation results and the mean modulus finite element model (E¯cFEM) simulation results were compared with the three-point bending tests and roll forming tests test results. The difference between the simulation results and the test results was small, indicating that the mean modulus was feasible to predict the springback, which verified the suitability of the E¯cFEM.

7.
Genome Res ; 19(9): 1527-41, 2009 Sep.
Artigo em Inglês | MEDLINE | ID: mdl-19546169

RESUMO

We describe the genome sequencing of an anonymous individual of African origin using a novel ligation-based sequencing assay that enables a unique form of error correction that improves the raw accuracy of the aligned reads to >99.9%, allowing us to accurately call SNPs with as few as two reads per allele. We collected several billion mate-paired reads yielding approximately 18x haploid coverage of aligned sequence and close to 300x clone coverage. Over 98% of the reference genome is covered with at least one uniquely placed read, and 99.65% is spanned by at least one uniquely placed mate-paired clone. We identify over 3.8 million SNPs, 19% of which are novel. Mate-paired data are used to physically resolve haplotype phases of nearly two-thirds of the genotypes obtained and produce phased segments of up to 215 kb. We detect 226,529 intra-read indels, 5590 indels between mate-paired reads, 91 inversions, and four gene fusions. We use a novel approach for detecting indels between mate-paired reads that are smaller than the standard deviation of the insert size of the library and discover deletions in common with those detected with our intra-read approach. Dozens of mutations previously described in OMIM and hundreds of nonsynonymous single-nucleotide and structural variants in genes previously implicated in disease are identified in this individual. There is more genetic variation in the human genome still to be uncovered, and we provide guidance for future surveys in populations and cancer biopsies.


Assuntos
Pareamento de Bases , Biologia Computacional/métodos , Variação Genética , Genoma Humano , Ligases , Análise de Sequência de DNA/métodos , África , Sequência de Bases , Genômica , Genótipo , Heterozigoto , Homozigoto , Humanos , Polimorfismo de Nucleotídeo Único , Padrões de Referência
8.
Bioinformatics ; 27(8): 1152-4, 2011 Apr 15.
Artigo em Inglês | MEDLINE | ID: mdl-21349863

RESUMO

UNLABELLED: We have implemented aggregation and correlation toolbox (ACT), an efficient, multifaceted toolbox for analyzing continuous signal and discrete region tracks from high-throughput genomic experiments, such as RNA-seq or ChIP-chip signal profiles from the ENCODE and modENCODE projects, or lists of single nucleotide polymorphisms from the 1000 genomes project. It is able to generate aggregate profiles of a given track around a set of specified anchor points, such as transcription start sites. It is also able to correlate related tracks and analyze them for saturation--i.e. how much of a certain feature is covered with each new succeeding experiment. The ACT site contains downloadable code in a variety of formats, interactive web servers (for use on small quantities of data), example datasets, documentation and a gallery of outputs. Here, we explain the components of the toolbox in more detail and apply them in various contexts. AVAILABILITY: ACT is available at http://act.gersteinlab.org CONTACT: pi@gersteinlab.org.


Assuntos
Genômica/métodos , Software , Polimorfismo de Nucleotídeo Único , Sítio de Iniciação de Transcrição
9.
PLoS Genet ; 4(7): e1000138, 2008 Jul 25.
Artigo em Inglês | MEDLINE | ID: mdl-18654629

RESUMO

Chromatin structure plays an important role in modulating the accessibility of genomic DNA to regulatory proteins in eukaryotic cells. We performed an integrative analysis on dozens of recent datasets generated by deep-sequencing and high-density tiling arrays, and we discovered an array of well-positioned nucleosomes flanking sites occupied by the insulator binding protein CTCF across the human genome. These nucleosomes are highly enriched for the histone variant H2A.Z and 11 histone modifications. The distances between the center positions of the neighboring nucleosomes are largely invariant, and we estimate them to be 185 bp on average. Surprisingly, subsets of nucleosomes that are enriched in different histone modifications vary greatly in the lengths of DNA protected from micrococcal nuclease cleavage (106-164 bp). The nucleosomes enriched in those histone modifications previously implicated to be correlated with active transcription tend to contain less protected DNA, indicating that these modifications are correlated with greater DNA accessibility. Another striking result obtained from our analysis is that nucleosomes flanking CTCF sites are much better positioned than those downstream of transcription start sites, the only genomic feature previously known to position nucleosomes genome-wide. This nucleosome-positioning phenomenon is not observed for other transcriptional factors for which we had genome-wide binding data. We suggest that binding of CTCF provides an anchor point for positioning nucleosomes, and chromatin remodeling is an important component of CTCF function.


Assuntos
Proteínas de Ligação a DNA/genética , Proteínas de Ligação a DNA/metabolismo , Genoma Humano , Nucleossomos/genética , Nucleossomos/metabolismo , Proteínas Repressoras/genética , Proteínas Repressoras/metabolismo , Sítios de Ligação , Fator de Ligação a CCCTC , Montagem e Desmontagem da Cromatina/fisiologia , Histonas/genética , Histonas/metabolismo , Humanos , Nuclease do Micrococo/farmacologia , Fatores de Transcrição/metabolismo
10.
Genome Biol ; 22(1): 111, 2021 04 16.
Artigo em Inglês | MEDLINE | ID: mdl-33863366

RESUMO

BACKGROUND: Oncopanel genomic testing, which identifies important somatic variants, is increasingly common in medical practice and especially in clinical trials. Currently, there is a paucity of reliable genomic reference samples having a suitably large number of pre-identified variants for properly assessing oncopanel assay analytical quality and performance. The FDA-led Sequencing and Quality Control Phase 2 (SEQC2) consortium analyze ten diverse cancer cell lines individually and their pool, termed Sample A, to develop a reference sample with suitably large numbers of coding positions with known (variant) positives and negatives for properly evaluating oncopanel analytical performance. RESULTS: In reference Sample A, we identify more than 40,000 variants down to 1% allele frequency with more than 25,000 variants having less than 20% allele frequency with 1653 variants in COSMIC-related genes. This is 5-100× more than existing commercially available samples. We also identify an unprecedented number of negative positions in coding regions, allowing statistical rigor in assessing limit-of-detection, sensitivity, and precision. Over 300 loci are randomly selected and independently verified via droplet digital PCR with 100% concordance. Agilent normal reference Sample B can be admixed with Sample A to create new samples with a similar number of known variants at much lower allele frequency than what exists in Sample A natively, including known variants having allele frequency of 0.02%, a range suitable for assessing liquid biopsy panels. CONCLUSION: These new reference samples and their admixtures provide superior capability for performing oncopanel quality control, analytical accuracy, and validation for small to large oncopanels and liquid biopsy assays.


Assuntos
Alelos , Biomarcadores Tumorais , Frequência do Gene , Testes Genéticos/métodos , Variação Genética , Genômica/métodos , Neoplasias/genética , Linhagem Celular Tumoral , Variações do Número de Cópias de DNA , Heterogeneidade Genética , Testes Genéticos/normas , Genômica/normas , Humanos , Neoplasias/diagnóstico , Fluxo de Trabalho
11.
PLoS Genet ; 3(8): e136, 2007 Aug.
Artigo em Inglês | MEDLINE | ID: mdl-17708682

RESUMO

The identification of regulatory elements from different cell types is necessary for understanding the mechanisms controlling cell type-specific and housekeeping gene expression. Mapping DNaseI hypersensitive (HS) sites is an accurate method for identifying the location of functional regulatory elements. We used a high throughput method called DNase-chip to identify 3,904 DNaseI HS sites from six cell types across 1% of the human genome. A significant number (22%) of DNaseI HS sites from each cell type are ubiquitously present among all cell types studied. Surprisingly, nearly all of these ubiquitous DNaseI HS sites correspond to either promoters or insulator elements: 86% of them are located near annotated transcription start sites and 10% are bound by CTCF, a protein with known enhancer-blocking insulator activity. We also identified a large number of DNaseI HS sites that are cell type specific (only present in one cell type); these regions are enriched for enhancer elements and correlate with cell type-specific gene expression as well as cell type-specific histone modifications. Finally, we found that approximately 8% of the genome overlaps a DNaseI HS site in at least one the six cell lines studied, indicating that a significant percentage of the genome is potentially functional.


Assuntos
Cromatina/química , Genoma Humano , Especificidade de Órgãos/genética , Elementos Reguladores de Transcrição , Sequência de Bases , Sítios de Ligação , Fator de Ligação a CCCTC , Linhagem da Célula/genética , Células Cultivadas , Mapeamento Cromossômico , Análise por Conglomerados , Ilhas de CpG/genética , Proteínas de Ligação a DNA/metabolismo , Desoxirribonuclease I/metabolismo , Células HeLa , Humanos , Elementos Isolantes/genética , Células K562 , Análise em Microsséries , Dados de Sequência Molecular , Proteínas Repressoras/metabolismo , Projetos de Pesquisa , Análise de Sequência de DNA/métodos
12.
Physiol Genomics ; 37(3): 199-210, 2009 May 13.
Artigo em Inglês | MEDLINE | ID: mdl-19258493

RESUMO

Caffeine is the most widely consumed psychoactive substance and has complex pharmacological actions in brain. In this study, we employed a novel drug target validation strategy to uncover the multiple molecular targets of caffeine using combined A(2A) receptor (A(2A)R) knockouts (KO) and microarray profiling. Caffeine (10 mg/kg) elicited a distinct profile of striatal gene expression in WT mice compared with that by A(2A)R gene deletion or by administering caffeine into A(2A)R KO mice. Thus, A(2A)Rs are required but not sufficient to elicit the striatal gene expression by caffeine (10 mg/kg). Caffeine (50 mg/kg) induced complex expression patterns with three distinct sets of striatal genes: 1) one subset overlapped with those elicited by genetic deletion of A(2A)Rs; 2) the second subset elicited by caffeine in WT as well as A(2A)R KO mice; and 3) the third subset elicited by caffeine only in A(2A)R KO mice. Furthermore, striatal gene sets elicited by the phosphodiesterase (PDE) inhibitor rolipram and the GABA(A) receptor antagonist bicucullin, overlapped with the distinct subsets of striatal genes elicited by caffeine (50 mg/kg) administered to A(2A)R KO mice. Finally, Gene Set Enrichment Analysis reveals that adipocyte differentiation/insulin signaling is highly enriched in the striatal gene sets elicited by both low and high doses of caffeine. The identification of these distinct striatal gene populations and their corresponding multiple molecular targets, including A(2A)R, non-A(2A)R (possibly A(1)Rs and pathways associated with PDE and GABA(A)R) and their interactions, and the cellular pathways affected by low and high doses of caffeine, provides molecular insights into the acute pharmacological effects of caffeine in the brain.


Assuntos
Cafeína/farmacologia , Perfilação da Expressão Gênica , Análise de Sequência com Séries de Oligonucleotídeos/métodos , Receptor A2A de Adenosina/fisiologia , Animais , Bicuculina/farmacologia , Estimulantes do Sistema Nervoso Central/farmacologia , Análise por Conglomerados , Relação Dose-Resposta a Droga , Feminino , Antagonistas GABAérgicos/farmacologia , Regulação da Expressão Gênica/efeitos dos fármacos , Masculino , Camundongos , Camundongos Knockout , Neostriado/efeitos dos fármacos , Neostriado/metabolismo , Inibidores de Fosfodiesterase/farmacologia , Receptor A2A de Adenosina/genética , Reação em Cadeia da Polimerase Via Transcriptase Reversa , Rolipram/farmacologia
13.
Nat Biotechnol ; 23(1): 137-44, 2005 Jan.
Artigo em Inglês | MEDLINE | ID: mdl-15637633

RESUMO

The prediction of regulatory elements is a problem where computational methods offer great hope. Over the past few years, numerous tools have become available for this task. The purpose of the current assessment is twofold: to provide some guidance to users regarding the accuracy of currently available tools in various settings, and to provide a benchmark of data sets for assessing future tools.


Assuntos
Biologia Computacional/métodos , Expressão Gênica , Transcrição Gênica , Motivos de Aminoácidos , Animais , Sítios de Ligação , Bases de Dados de Proteínas , Drosophila , Proteínas Fúngicas/química , Humanos , Internet , Camundongos , Reprodutibilidade dos Testes , Software
14.
Nucleic Acids Res ; 32(Web Server issue): W235-41, 2004 Jul 01.
Artigo em Inglês | MEDLINE | ID: mdl-15215387

RESUMO

Transcriptional regulation is one of the most basic regulatory mechanisms in the cell. The accumulation of multiple metazoan genome sequences and the advent of high-throughput experimental techniques have motivated the development of a large number of bioinformatics methods for the detection of regulatory motifs. The regulatory process is extremely complex and individual computational algorithms typically have very limited success in genome-scale studies. Here, we argue the importance of integrating multiple computational algorithms and present an infrastructure that integrates eight web services covering key areas of transcriptional regulation. We have adopted the client-side integration technology and built a consistent input and output environment with a versatile visualization tool named SeqVISTA. The infrastructure will allow for easy integration of gene regulation analysis software that is scattered over the Internet. It will also enable bench biologists to perform an arsenal of analysis using cutting-edge methods in a familiar environment and bioinformatics researchers to focus on developing new algorithms without the need to invest substantial effort on complex pre- or post-processors. SeqVISTA is freely available to academic users and can be launched online at http://zlab.bu.edu/SeqVISTA/web.jnlp, provided that Java Web Start has been installed. In addition, a stand-alone version of the program can be downloaded and run locally. It can be obtained at http://zlab.bu.edu/SeqVISTA.


Assuntos
Biologia Computacional , DNA/química , Sequências Reguladoras de Ácido Nucleico , Software , Transcrição Gênica , Algoritmos , Sítios de Ligação , DNA/metabolismo , Regulação da Expressão Gênica , Internet , Integração de Sistemas , Fatores de Transcrição/metabolismo
15.
Nucleic Acids Res ; 32(Web Server issue): W420-3, 2004 Jul 01.
Artigo em Inglês | MEDLINE | ID: mdl-15215422

RESUMO

Detecting overrepresented known transcription factor binding motifs in a set of promoter sequences of co-regulated genes has become an important approach to deciphering transcriptional regulatory mechanisms. In this paper, we present an interactive web server, MotifViz, for three motif discovery programs, Clover, Rover and Motifish, covering most available flavors of algorithms for achieving this goal. For comparison, we have also implemented the simple motif-matching program Possum. MotifViz provides uniform and intuitive input and output formats for all four programs. It can be accessed at http://biowulf.bu.edu/MotifViz.


Assuntos
Regiões Promotoras Genéticas , Análise de Sequência de DNA , Software , Algoritmos , Sítios de Ligação , Gráficos por Computador , Internet , Fatores de Transcrição/metabolismo , Interface Usuário-Computador
16.
Nucleic Acids Res ; 32(4): 1372-81, 2004.
Artigo em Inglês | MEDLINE | ID: mdl-14988425

RESUMO

The interaction of proteins with DNA recognition motifs regulates a number of fundamental biological processes, including transcription. To understand these processes, we need to know which motifs are present in a sequence and which factors bind to them. We describe a method to screen a set of DNA sequences against a precompiled library of motifs, and assess which, if any, of the motifs are statistically over- or under-represented in the sequences. Over-represented motifs are good candidates for playing a functional role in the sequences, while under-representation hints that if the motif were present, it would have a harmful dysregulatory effect. We apply our method (implemented as a computer program called Clover) to dopamine-responsive promoters, sequences flanking binding sites for the transcription factor LSF, sequences that direct transcription in muscle and liver, and Drosophila segmentation enhancers. In each case Clover successfully detects motifs known to function in the sequences, and intriguing and testable hypotheses are made concerning additional motifs. Clover compares favorably with an ab initio motif discovery algorithm based on sequence alignment, when the motif library includes only a homolog of the factor that actually regulates the sequences. It also demonstrates superior performance over two contingency table based over-representation methods. In conclusion, Clover has the potential to greatly accelerate characterization of signals that regulate transcription.


Assuntos
Sequências Reguladoras de Ácido Nucleico , Análise de Sequência de DNA/métodos , Fatores de Transcrição/metabolismo , Animais , Sítios de Ligação , Proteínas de Ligação a DNA/metabolismo , Interpretação Estatística de Dados , Dopamina/fisiologia , Drosophila/genética , Humanos , Fígado/metabolismo , Músculo Esquelético/metabolismo , Regiões Promotoras Genéticas , Proteínas de Ligação a RNA , Software
17.
Sci Data ; 3: 160025, 2016 Jun 07.
Artigo em Inglês | MEDLINE | ID: mdl-27271295

RESUMO

The Genome in a Bottle Consortium, hosted by the National Institute of Standards and Technology (NIST) is creating reference materials and data for human genome sequencing, as well as methods for genome comparison and benchmarking. Here, we describe a large, diverse set of sequencing data for seven human genomes; five are current or candidate NIST Reference Materials. The pilot genome, NA12878, has been released as NIST RM 8398. We also describe data from two Personal Genome Project trios, one of Ashkenazim Jewish ancestry and one of Chinese ancestry. The data come from 12 technologies: BioNano Genomics, Complete Genomics paired-end and LFR, Ion Proton exome, Oxford Nanopore, Pacific Biosciences, SOLiD, 10X Genomics GemCode WGS, and Illumina exome and WGS paired-end, mate-pair, and synthetic long reads. Cell lines, DNA, and data from these individuals are publicly available. Therefore, we expect these data to be useful for revealing novel information about the human genome and improving sequencing technologies, SNP, indel, and structural variant calling, and de novo assembly.


Assuntos
Benchmarking , Genoma Humano , Exoma , Genômica , Humanos , Mutação INDEL
18.
Genome Inform ; 16(1): 68-72, 2005.
Artigo em Inglês | MEDLINE | ID: mdl-16362908

RESUMO

This paper describes a novel approach to constructing Position-Specific Weight Matrices (PWMs) based on the transcription factor binding site (TFBS) data provide by the TRANSFAC database and comparison of the newly generated PWMs with the original TRANSFAC matrices. Multiple local sequence alignment was performed on the TFBSs of each transcription factor. Several different alignment programs were tested and their matrices were compared to the original TRANSFAC matrices. One of the alignment programs, GLAM, produced comparable matrices in terms of the average ranking of true positive sites across the whole test set of sequences.


Assuntos
DNA/química , Alinhamento de Sequência/métodos , Análise de Sequência de DNA/métodos , Software , Fatores de Transcrição/metabolismo , Algoritmos , Sequência de Bases , Sítios de Ligação , Biologia Computacional , DNA/metabolismo , Bases de Dados Factuais , Sequências Reguladoras de Ácido Nucleico
19.
Genome Inform ; 15(1): 239-48, 2004.
Artigo em Inglês | MEDLINE | ID: mdl-15712126

RESUMO

Recent advances in high throughput profiling of gene expression have catalyzed an explosive growth in functional genomics aimed at the elucidation of genes that are differentially expressed in various tissue or cell types across a range of experimental conditions. These studies can lead to the identification of diagnostic genes, classification of genes into functional categories, association of genes with regulatory pathways, and clustering of genes into modules that are potentially co-regulated by a group of transcription factors. Traditional clustering methods such as hierarchical clustering or principal component analysis are difficult to deploy effectively for several of these tasks since genes rarely exhibit similar expression pattern across a wide range of conditions. Bi-clustering of gene expression data is a promising methodology for identification of gene groups that show a coherent expression profile across a subset of conditions. This methodology can be a first step towards the discovery of co-regulated and co-expressed genes or modules. Although bi-clustering (also called block clustering) was introduced in statistics in 1974 few robust and efficient solutions exist for extracting gene expression modules in microarray data. In this paper, we propose a simple but promising new approach for bi-clustering based on a Gibbs sampling paradigm. Our algorithm is implemented in the program GEMS (Gene Expression Module Sampler). GEMS has been tested on synthetic data generated to evaluate the effect of noise on the performance of the algorithm as well as on published leukemia datasets. In our preliminary studies comparing GEMS with other bi-clustering software we show that GEMS is a reliable, flexible and computationally efficient approach for bi-clustering gene expression data.


Assuntos
Leucemia/genética , Modelos Genéticos , Análise por Conglomerados , Biologia Computacional/métodos , Bases de Dados de Ácidos Nucleicos , Expressão Gênica , Humanos , Análise de Sequência com Séries de Oligonucleotídeos
20.
PLoS One ; 6(7): e22250, 2011.
Artigo em Inglês | MEDLINE | ID: mdl-21799804

RESUMO

Comprehensive identification of the acquired mutations that cause common cancers will require genomic analyses of large sets of tumor samples. Typically, the tissue material available from tumor specimens is limited, which creates a demand for accurate template amplification. We therefore evaluated whether phi29-mediated whole genome amplification introduces false positive structural mutations by massive mate-pair sequencing of a normal human genome before and after such amplification. Multiple displacement amplification led to a decrease in clone coverage and an increase by two orders of magnitude in the prevalence of inversions, but did not increase the prevalence of translocations. While multiple strand displacement amplification may find uses in translocation analyses, it is likely that alternative amplification strategies need to be developed to meet the demands of cancer genomics.


Assuntos
Artefatos , Genoma Humano/genética , Mutação/genética , Técnicas de Amplificação de Ácido Nucleico/métodos , Análise de Sequência de DNA , Reações Falso-Positivas , Feminino , Rearranjo Gênico/genética , Humanos
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA