Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 3 de 3
Filtrar
Mais filtros











Base de dados
Intervalo de ano de publicação
1.
Sci Data ; 7(1): 399, 2020 11 17.
Artigo em Inglês | MEDLINE | ID: mdl-33203859

RESUMO

The PacBio® HiFi sequencing method yields highly accurate long-read sequencing datasets with read lengths averaging 10-25 kb and accuracies greater than 99.5%. These accurate long reads can be used to improve results for complex applications such as single nucleotide and structural variant detection, genome assembly, assembly of difficult polyploid or highly repetitive genomes, and assembly of metagenomes. Currently, there is a need for sample data sets to both evaluate the benefits of these long accurate reads as well as for development of bioinformatic tools including genome assemblers, variant callers, and haplotyping algorithms. We present deep coverage HiFi datasets for five complex samples including the two inbred model genomes Mus musculus and Zea mays, as well as two complex genomes, octoploid Fragaria × ananassa and the diploid anuran Rana muscosa. Additionally, we release sequence data from a mock metagenome community. The datasets reported here can be used without restriction to develop new algorithms and explore complex genome structure and evolution. Data were generated on the PacBio Sequel II System.


Assuntos
Sequenciamento de Nucleotídeos em Larga Escala , Camundongos/genética , Zea mays/genética , Animais , Fragaria/genética , Genoma de Planta , Metagenoma , Ranidae/genética , Análise de Sequência de DNA
2.
Nat Biotechnol ; 33(6): 623-30, 2015 Jun.
Artigo em Inglês | MEDLINE | ID: mdl-26006009

RESUMO

Long-read, single-molecule real-time (SMRT) sequencing is routinely used to finish microbial genomes, but available assembly methods have not scaled well to larger genomes. We introduce the MinHash Alignment Process (MHAP) for overlapping noisy, long reads using probabilistic, locality-sensitive hashing. Integrating MHAP with the Celera Assembler enabled reference-grade de novo assemblies of Saccharomyces cerevisiae, Arabidopsis thaliana, Drosophila melanogaster and a human hydatidiform mole cell line (CHM1) from SMRT sequencing. The resulting assemblies are highly continuous, include fully resolved chromosome arms and close persistent gaps in these reference genomes. Our assembly of D. melanogaster revealed previously unknown heterochromatic and telomeric transition sequences, and we assembled low-complexity sequences from CHM1 that fill gaps in the human GRCh38 reference. Using MHAP and the Celera Assembler, single-molecule sequencing can produce de novo near-complete eukaryotic assemblies that are 99.99% accurate when compared with available reference genomes.


Assuntos
Genoma Fúngico , Genoma Humano , Genoma de Inseto , Genoma de Planta , Análise de Sequência de DNA , Animais , Arabidopsis/genética , Sequência de Bases , Cromossomos/genética , Drosophila melanogaster/genética , Heterocromatina , Sequenciamento de Nucleotídeos em Larga Escala/métodos , Humanos , Saccharomyces cerevisiae/genética , Alinhamento de Sequência
3.
Genome Res ; 20(7): 890-8, 2010 Jul.
Artigo em Inglês | MEDLINE | ID: mdl-20501695

RESUMO

Promoters are important regulatory elements that contain the necessary sequence features for cells to initiate transcription. To functionally characterize a large set of human promoters, we measured the transcriptional activities of 4575 putative promoters across eight cell lines using transient transfection reporter assays. In parallel, we measured gene expression in the same cell lines and observed a significant correlation between promoter activity and endogenous gene expression (r = 0.43). As transient transfection assays directly measure the promoting effect of a defined fragment of DNA sequence, decoupled from epigenetic, chromatin, or long-range regulatory effects, we sought to predict whether a promoter was active using sequence features alone. CG dinucleotide content was highly predictive of ubiquitous promoter activity, necessitating the separation of promoters into two groups: high CG promoters, mostly ubiquitously active, and low CG promoters, mostly cell line-specific. Computational models trained on the binding potential of transcriptional factor (TF) binding motifs could predict promoter activities in both high and low CG groups: average area under the receiver operating characteristic curve (AUC) of the models was 91% and exceeded the AUC of CG content by an average of 23%. Known relationships, for example, between HNF4A and hepatocytes, were recapitulated in the corresponding cell lines, in this case the liver-derived cell line HepG2. Half of the associations between tissue-specific TFs and cell line-specific promoters were new. Our study underscores the importance of collecting functional information from complementary assays and conditions to understand biology in a systematic framework.


Assuntos
Sequência de Bases/fisiologia , Especificidade de Órgãos/genética , Regiões Promotoras Genéticas/genética , Regiões Promotoras Genéticas/fisiologia , Composição de Bases/fisiologia , Sítios de Ligação/genética , Linhagem Celular , Biologia Computacional/métodos , Epigênese Genética/fisiologia , Expressão Gênica/genética , Expressão Gênica/fisiologia , Células Hep G2 , Fator 4 Nuclear de Hepatócito/genética , Humanos , Ligação Proteica , Transcrição Gênica , Transfecção
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA