Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 91
Filtrar
1.
Nature ; 611(7936): 519-531, 2022 Nov.
Artigo em Inglês | MEDLINE | ID: mdl-36261518

RESUMO

The current human reference genome, GRCh38, represents over 20 years of effort to generate a high-quality assembly, which has benefitted society1,2. However, it still has many gaps and errors, and does not represent a biological genome as it is a blend of multiple individuals3,4. Recently, a high-quality telomere-to-telomere reference, CHM13, was generated with the latest long-read technologies, but it was derived from a hydatidiform mole cell line with a nearly homozygous genome5. To address these limitations, the Human Pangenome Reference Consortium formed with the goal of creating high-quality, cost-effective, diploid genome assemblies for a pangenome reference that represents human genetic diversity6. Here, in our first scientific report, we determined which combination of current genome sequencing and assembly approaches yield the most complete and accurate diploid genome assembly with minimal manual curation. Approaches that used highly accurate long reads and parent-child data with graph-based haplotype phasing during assembly outperformed those that did not. Developing a combination of the top-performing methods, we generated our first high-quality diploid reference assembly, containing only approximately four gaps per chromosome on average, with most chromosomes within ±1% of the length of CHM13. Nearly 48% of protein-coding genes have non-synonymous amino acid changes between haplotypes, and centromeric regions showed the highest diversity. Our findings serve as a foundation for assembling near-complete diploid human genomes at scale for a pangenome reference to capture global genetic variation from single nucleotides to structural rearrangements.


Assuntos
Mapeamento Cromossômico , Diploide , Genoma Humano , Genômica , Humanos , Mapeamento Cromossômico/normas , Genoma Humano/genética , Haplótipos/genética , Sequenciamento de Nucleotídeos em Larga Escala/métodos , Sequenciamento de Nucleotídeos em Larga Escala/normas , Análise de Sequência de DNA/métodos , Análise de Sequência de DNA/normas , Padrões de Referência , Genômica/métodos , Genômica/normas , Cromossomos Humanos/genética , Variação Genética/genética
2.
Genes (Basel) ; 12(12)2021 12 08.
Artigo em Inglês | MEDLINE | ID: mdl-34946907

RESUMO

In recent years, optical genome mapping (OGM) has developed into a highly promising method of detecting large-scale structural variants in human genomes. It is capable of detecting structural variants considered difficult to detect by other current methods. Hence, it promises to be feasible as a first-line diagnostic tool, permitting insight into a new realm of previously unknown variants. However, due to its novelty, little experience with OGM is available to infer best practices for its application or to clarify which features cannot be detected. In this study, we used the Saphyr system (Bionano Genomics, San Diego, CA, USA), to explore its capabilities in human genetic diagnostics. To this end, we tested 14 DNA samples to confirm a total of 14 different structural or numerical chromosomal variants originally detected by other means, namely, deletions, duplications, inversions, trisomies, and a translocation. Overall, 12 variants could be confirmed; one deletion and one inversion could not. The prerequisites for detection of similar variants were explored by reviewing the OGM data of 54 samples analyzed in our laboratory. Limitations, some owing to the novelty of the method and some inherent to it, were described. Finally, we tested the successful application of OGM in routine diagnostics and described some of the challenges that merit consideration when utilizing OGM as a diagnostic tool.


Assuntos
Aberrações Cromossômicas , Transtornos Cromossômicos/diagnóstico , Mapeamento Cromossômico/métodos , Mapeamento Cromossômico/normas , Variações do Número de Cópias de DNA , Genoma Humano , Cariotipagem/métodos , Transtornos Cromossômicos/genética , Feminino , Humanos , Masculino
3.
PLoS Comput Biol ; 17(10): e1008839, 2021 10.
Artigo em Inglês | MEDLINE | ID: mdl-34634030

RESUMO

Hi-C is a sample preparation method that enables high-throughput sequencing to capture genome-wide spatial interactions between DNA molecules. The technique has been successfully applied to solve challenging problems such as 3D structural analysis of chromatin, scaffolding of large genome assemblies and more recently the accurate resolution of metagenome-assembled genomes (MAGs). Despite continued refinements, however, preparing a Hi-C library remains a complex laboratory protocol. To avoid costly failures and maximise the odds of successful outcomes, diligent quality management is recommended. Current wet-lab methods provide only a crude assay of Hi-C library quality, while key post-sequencing quality indicators used have-thus far-relied upon reference-based read-mapping. When a reference is accessible, this reliance introduces a concern for quality, where an incomplete or inexact reference skews the resulting quality indicators. We propose a new, reference-free approach that infers the total fraction of read-pairs that are a product of proximity ligation. This quantification of Hi-C library quality requires only a modest amount of sequencing data and is independent of other application-specific criteria. The algorithm builds upon the observation that proximity ligation events are likely to create k-mers that would not naturally occur in the sample. Our software tool (qc3C) is to our knowledge the first to implement a reference-free Hi-C QC tool, and also provides reference-based QC, enabling Hi-C to be more easily applied to non-model organisms and environmental samples. We characterise the accuracy of the new algorithm on simulated and real datasets and compare it to reference-based methods.


Assuntos
Mapeamento Cromossômico , Genômica , Sequenciamento de Nucleotídeos em Larga Escala , Controle de Qualidade , Software , Algoritmos , Animais , Mapeamento Cromossômico/métodos , Mapeamento Cromossômico/normas , DNA/química , DNA/genética , Biblioteca Gênica , Genômica/métodos , Genômica/normas , Sequenciamento de Nucleotídeos em Larga Escala/métodos , Sequenciamento de Nucleotídeos em Larga Escala/normas , Humanos , Tartarugas
4.
PLoS Comput Biol ; 17(1): e1008678, 2021 01.
Artigo em Inglês | MEDLINE | ID: mdl-33503026

RESUMO

Mapping of high-throughput sequencing (HTS) reads to a single arbitrary reference genome is a frequently used approach in microbial genomics. However, the choice of a reference may represent a source of errors that may affect subsequent analyses such as the detection of single nucleotide polymorphisms (SNPs) and phylogenetic inference. In this work, we evaluated the effect of reference choice on short-read sequence data from five clinically and epidemiologically relevant bacteria (Klebsiella pneumoniae, Legionella pneumophila, Neisseria gonorrhoeae, Pseudomonas aeruginosa and Serratia marcescens). Publicly available whole-genome assemblies encompassing the genomic diversity of these species were selected as reference sequences, and read alignment statistics, SNP calling, recombination rates, dN/dS ratios, and phylogenetic trees were evaluated depending on the mapping reference. The choice of different reference genomes proved to have an impact on almost all the parameters considered in the five species. In addition, these biases had potential epidemiological implications such as including/excluding isolates of particular clades and the estimation of genetic distances. These findings suggest that the single reference approach might introduce systematic errors during mapping that affect subsequent analyses, particularly for data sets with isolates from genetically diverse backgrounds. In any case, exploring the effects of different references on the final conclusions is highly recommended.


Assuntos
Mapeamento Cromossômico/métodos , Mapeamento Cromossômico/normas , Genômica/métodos , Sequenciamento de Nucleotídeos em Larga Escala/métodos , Bactérias/classificação , Bactérias/genética , Genoma Bacteriano/genética , Filogenia , Polimorfismo de Nucleotídeo Único/genética , Alinhamento de Sequência
5.
Genet Sel Evol ; 52(1): 73, 2020 Dec 14.
Artigo em Inglês | MEDLINE | ID: mdl-33317445

RESUMO

BACKGROUND: Recombination is a process by which chromosomes are broken and recombine to generate new combinations of alleles, therefore playing a major role in shaping genome variation. Recombination frequencies ([Formula: see text]) between markers are used to construct genetic maps, which have important implications in genomic studies. Here, we report a recombination map for 44,696 autosomal single nucleotide polymorphisms (SNPs) according to the coordinates of the most recent bovine reference assembly. The recombination frequencies were estimated across 876 half-sib families with a minimum number of 39 and maximum number of 4236 progeny, comprising over 367 K genotyped German Holstein animals. RESULTS: Genome-wide, over 8.9 million paternal recombination events were identified by investigating adjacent markers. The recombination map spans 24.43 Morgan (M) for a chromosomal length of 2486 Mbp and an average of ~ 0.98 cM/Mbp, which concords with the available pedigree-based linkage maps. Furthermore, we identified 971 putative recombination hotspot intervals (defined as [Formula: see text] > 2.5 standard deviations greater than the mean). The hotspot regions were non-uniformly distributed as sharp and narrow peaks, corresponding to ~ 5.8% of the recombination that has taken place in only ~ 2.4% of the genome. We verified genetic map length by applying a likelihood-based approach for the estimation of recombination rate between all intra-chromosomal marker pairs. This resulted in a longer autosomal genetic length for male cattle (25.35 cM) and in the localization of 51 putatively misplaced SNPs in the genome assembly. CONCLUSIONS: Given the fact that this map is built on the coordinates of the ARS-UCD1.2 assembly, our results provide the most updated genetic map yet available for the cattle genome.


Assuntos
Bovinos/genética , Mapeamento Cromossômico/métodos , Cromossomos/genética , Recombinação Genética , Animais , Mapeamento Cromossômico/normas , Ligação Genética , Linhagem , Polimorfismo de Nucleotídeo Único , Padrões de Referência
6.
Nat Commun ; 11(1): 5040, 2020 10 07.
Artigo em Inglês | MEDLINE | ID: mdl-33028839

RESUMO

Bringing together cancer genomes from different projects increases power and allows the investigation of pan-cancer, molecular mechanisms. However, working with whole genomes sequenced over several years in different sequencing centres requires a framework to compare the quality of these sequences. We used the Pan-Cancer Analysis of Whole Genomes cohort as a test case to construct such a framework. This cohort contains whole cancer genomes of 2832 donors from 18 sequencing centres. We developed a non-redundant set of five quality control (QC) measurements to establish a star rating system. These QC measures reflect known differences in sequencing protocol and provide a guide to downstream analyses and allow for exclusion of samples of poor quality. We have found that this is an effective framework of quality measures. The implementation of the framework is available at: https://dockstore.org/containers/quay.io/jwerner_dkfz/pancanqc:1.2.2 .


Assuntos
Genoma Humano/genética , Genômica/normas , Neoplasias/genética , Controle de Qualidade , Mapeamento Cromossômico/normas , Cromossomos Humanos/genética , Análise Mutacional de DNA/normas , Feminino , Genômica/métodos , Sequenciamento de Nucleotídeos em Larga Escala/normas , Humanos , Masculino , Mutação , Software , Sequenciamento Completo do Genoma/normas
7.
Arch Med Sadowej Kryminol ; 70(1): 1-18, 2020.
Artigo em Inglês | MEDLINE | ID: mdl-32876419

RESUMO

Y chromosome typing has been performed in forensic genetic practice for more than 20 years. The latest recommendations of the DNA Commission of the International Society of Forensic Genetics (ISFG) concerning the application of Y-chromosomal markers in forensic genetics were published in 2006. The aim of this report is to recapitulate, systematise and supplement existing recommendations on the forensic analysis of polymorphism of the Y chromosome with standards already implemented in practice, new capabilities linked to the development of research techniques as well as current solutions used in statistical analysis. The recommendations have been adapted specifically to aspects related to the preparation of expert opinions in the field of forensic genetics in Poland. The Polish Speaking Working Group of the ISFG believes that the presented guidelines should become a standard implemented by all Polish laboratories performing Y chromosome typing for forensic purposes.


Assuntos
Cromossomos Humanos Y , Impressões Digitais de DNA/normas , Genética Forense/normas , Polimorfismo Genético , Mapeamento Cromossômico/normas , Prova Pericial/normas , Guias como Assunto , Humanos , Polônia , Sociedades Científicas/normas
8.
JMIR Public Health Surveill ; 6(2): e15917, 2020 04 30.
Artigo em Inglês | MEDLINE | ID: mdl-32352389

RESUMO

BACKGROUND: Many public health departments use record linkage between surveillance data and external data sources to inform public health interventions. However, little guidance is available to inform these activities, and many health departments rely on deterministic algorithms that may miss many true matches. In the context of public health action, these missed matches lead to missed opportunities to deliver interventions and may exacerbate existing health inequities. OBJECTIVE: This study aimed to compare the performance of record linkage algorithms commonly used in public health practice. METHODS: We compared five deterministic (exact, Stenger, Ocampo 1, Ocampo 2, and Bosh) and two probabilistic record linkage algorithms (fastLink and beta record linkage [BRL]) using simulations and a real-world scenario. We simulated pairs of datasets with varying numbers of errors per record and the number of matching records between the two datasets (ie, overlap). We matched the datasets using each algorithm and calculated their recall (ie, sensitivity, the proportion of true matches identified by the algorithm) and precision (ie, positive predictive value, the proportion of matches identified by the algorithm that were true matches). We estimated the average computation time by performing a match with each algorithm 20 times while varying the size of the datasets being matched. In a real-world scenario, HIV and sexually transmitted disease surveillance data from King County, Washington, were matched to identify people living with HIV who had a syphilis diagnosis in 2017. We calculated the recall and precision of each algorithm compared with a composite standard based on the agreement in matching decisions across all the algorithms and manual review. RESULTS: In simulations, BRL and fastLink maintained a high recall at nearly all data quality levels, while being comparable with deterministic algorithms in terms of precision. Deterministic algorithms typically failed to identify matches in scenarios with low data quality. All the deterministic algorithms had a shorter average computation time than the probabilistic algorithms. BRL had the slowest overall computation time (14 min when both datasets contained 2000 records). In the real-world scenario, BRL had the lowest trade-off between recall (309/309, 100.0%) and precision (309/312, 99.0%). CONCLUSIONS: Probabilistic record linkage algorithms maximize the number of true matches identified, reducing gaps in the coverage of interventions and maximizing the reach of public health action.


Assuntos
Algoritmos , COVID-19/diagnóstico , Mapeamento Cromossômico/normas , Registros Eletrônicos de Saúde/instrumentação , Saúde Pública/instrumentação , COVID-19/fisiopatologia , Mapeamento Cromossômico/métodos , Mapeamento Cromossômico/estatística & dados numéricos , Registros Eletrônicos de Saúde/normas , Registros Eletrônicos de Saúde/tendências , Humanos , Pandemias/prevenção & controle , Saúde Pública/métodos , Saúde Pública/tendências , Reprodutibilidade dos Testes , Estudos de Validação como Assunto
9.
Int J Parasitol ; 49(11): 847-858, 2019 10.
Artigo em Inglês | MEDLINE | ID: mdl-31525371

RESUMO

Differential expression analysis between parasitic nematode strains is commonly used to implicate candidate genes in anthelmintic resistance or other biological functions. We have tested the hypothesis that the high genetic diversity of an organism such as Haemonchus contortus could complicate such analyses. First, we investigated the extent to which sequence polymorphism affects the reliability of differential expression analysis between the genetically divergent H. contortus strains MHco3(ISE), MHco4(WRS) and MHco10(CAVR). Using triplicates of 20 adult female worms from each population isolated under parallel experimental conditions, we found that high rates of sequence polymorphism in RNAseq reads were associated with lower efficiency read mapping to gene models under default TopHat2 parameters, leading to biased estimates of inter-strain differential expression. We then showed it is possible to largely compensate for this bias by optimising the read mapping single nucleotide polymorphism (SNP) allowance and filtering out genes with particularly high single nucleotide polymorphism rates. Once the sequence polymorphism biases were removed, we then assessed the genuine transcriptional diversity between the strains, finding ≥824 differentially expressed genes across all three pairwise strain comparisons. This high level of inter-strain transcriptional diversity not only suggests substantive inter-strain phenotypic variation but also highlights the difficulty in reliably associating differential expression of specific genes with phenotypic differences. To provide a practical example, we analysed two gene families of potential relevance to ivermectin drug resistance; the ABC transporters and the ligand-gated ion channels (LGICs). Over half of genes identified as differentially expressed using default TopHat2 parameters were shown to be an artifact of sequence polymorphism differences. This work illustrates the need to account for sequence polymorphism in differential expression analysis. It also demonstrates that a large number of genuine transcriptional differences can occur between H. contortus strains and these must be considered before associating the differential expression of specific genes with phenotypic differences between strains.


Assuntos
Perfilação da Expressão Gênica/métodos , Perfilação da Expressão Gênica/normas , Variação Genética , Haemonchus/genética , Animais , Anti-Helmínticos/farmacologia , Mapeamento Cromossômico/métodos , Mapeamento Cromossômico/normas , Biologia Computacional/métodos , Biologia Computacional/normas , Resistência a Medicamentos , Haemonchus/efeitos dos fármacos , Ivermectina/farmacologia , Análise de Sequência de RNA/métodos , Análise de Sequência de RNA/normas
10.
Chromosome Res ; 26(4): 297-306, 2018 12.
Artigo em Inglês | MEDLINE | ID: mdl-30225548

RESUMO

The chicken genome was the third vertebrate to be sequenced. To date, its sequence and feature annotations are used as the reference for avian models in genome sequencing projects developed on birds and other Sauropsida species, and in genetic studies of domesticated birds of economic and evolutionary biology interest. Therefore, an accurate description of this genome model is important to a wide number of scientists. Here, we review the location and features of a very basic element, the centromeres of chromosomes in the galGal5 genome model. Centromeres are elements that are not determined by their DNA sequence but by their epigenetic status, in particular by the accumulation of the histone-like protein CENP-A. Comparison of data from several public sources (primarily marker probes flanking centromeres using fluorescent in situ hybridization done on giant lampbrush chromosomes and CENP-A ChIP-seq datasets) with galGal5 annotations revealed that centromeres are likely inappropriately mapped in 9 of the 16 galGal5 chromosome models in which they are described. Analysis of karyology data confirmed that the location of the main CENP-A peaks in chromosomes is the best means of locating the centromeres in 25 galGal5 chromosome models, the majority of which (16) are fully sequenced and assembled. This data re-analysis reaffirms that several sources of information should be examined to produce accurate genome annotations, particularly for basic structures such as centromeres that are epigenetically determined.


Assuntos
Proteína Centromérica A/metabolismo , Centrômero/ultraestrutura , Galinhas/genética , Genoma/genética , Animais , Proteínas Cromossômicas não Histona , Mapeamento Cromossômico/normas , Epigenômica
11.
J Gen Intern Med ; 33(6): 877-885, 2018 06.
Artigo em Inglês | MEDLINE | ID: mdl-29374360

RESUMO

BACKGROUND: Genomics will play an increasingly prominent role in clinical medicine. OBJECTIVE: To describe how primary care physicians (PCPs) discuss and make clinical recommendations about genome sequencing results. DESIGN: Qualitative analysis. PARTICIPANTS: PCPs and their generally healthy patients undergoing genome sequencing. APPROACH: Patients received clinical genome reports that included four categories of results: monogenic disease risk variants (if present), carrier status, five pharmacogenetics results, and polygenic risk estimates for eight cardiometabolic traits. Patients' office visits with their PCPs were audio-recorded, and summative content analysis was used to describe how PCPs discussed genomic results. KEY RESULTS: For each genomic result discussed in 48 PCP-patient visits, we identified a "take-home" message (recommendation), categorized as continuing current management, further treatment, further evaluation, behavior change, remembering for future care, or sharing with family members. We analyzed how PCPs came to each recommendation by identifying 1) how they described the risk or importance of the given result and 2) the rationale they gave for translating that risk into a specific recommendation. Quantitative analysis showed that continuing current management was the most commonly coded recommendation across results overall (492/749, 66%) and for each individual result type except monogenic disease risk results. Pharmacogenetics was the most common result type to prompt a recommendation to remember for future care (94/119, 79%); carrier status was the most common type prompting a recommendation to share with family members (45/54, 83%); and polygenic results were the most common type prompting a behavior change recommendation (55/58, 95%). One-fifth of recommendation codes associated with monogenic results were for further evaluation (6/24, 25%). Rationales for these recommendations included patient context, family context, and scientific/clinical limitations of sequencing. CONCLUSIONS: PCPs distinguish substantive differences among categories of genome sequencing results and use clinical judgment to justify continuing current management in generally healthy patients with genomic results.


Assuntos
Atitude do Pessoal de Saúde , Tomada de Decisão Clínica , Testes Genéticos/normas , Relações Médico-Paciente , Médicos de Atenção Primária/normas , Atenção Primária à Saúde/normas , Adulto , Mapeamento Cromossômico/métodos , Mapeamento Cromossômico/normas , Tomada de Decisão Clínica/métodos , Feminino , Testes Genéticos/métodos , Humanos , Masculino , Médicos de Atenção Primária/psicologia , Projetos Piloto , Atenção Primária à Saúde/métodos , Fatores de Risco
12.
ACS Synth Biol ; 6(9): 1609-1613, 2017 09 15.
Artigo em Inglês | MEDLINE | ID: mdl-28911233

RESUMO

CRISPR/Cas9 system has accelerated research across many fields since its demonstration for genome editing. CRISPR also offers vast therapeutic potential, but an important hurdle of this technology is the off-target mutations it can induce. In this viewpoint, we will discuss recent strategies for improving CRISPR specificity, emphasizing how a complete mechanistic understanding of CRISPR/Cas9 can benefit such efforts. We also propose that agreeing upon a consensus protocol with the highest specificity could benefit researchers working on CRISPR-based therapies. In addition to improving CRISPR/Cas9 specificity, accurate detection of off-target events is also crucial, and we will discuss various unbiased off-target detection methods in terms of their advantages and disadvantages. We suggest that using a combination of cell-based and cell-free methods can prove more useful. In addition, we point out that improving predictive algorithms for off-target sites would require pooling of the available off-target analysis data and standardization of the protocols used for obtaining the data. Moreover, we highlight the risk of insertional mutagenesis for gene correction applications requiring the use of donor DNA. We conclude by discussing future prospects for the field, as well as steps that can be taken to overcome the aforementioned challenges.


Assuntos
Sistemas CRISPR-Cas/genética , Mapeamento Cromossômico/normas , Edição de Genes/normas , Marcação de Genes/métodos , Marcação de Genes/normas , Engenharia Genética/normas , Mapeamento Cromossômico/métodos , Edição de Genes/métodos , Engenharia Genética/métodos , Guias como Assunto , Reprodutibilidade dos Testes , Sensibilidade e Especificidade
13.
Genetics ; 207(3): 873-882, 2017 11.
Artigo em Inglês | MEDLINE | ID: mdl-28951529

RESUMO

Admixed populations result from recent admixture of two or more ancestral populations with divergent allele frequencies. The genome of each admixed individual is a mosaic of haplotypes inherited from the ancestral populations. Despite the substantial work to assess power and sample size requirements for association mapping in genetically homogeneous populations of European ancestry, power and sample size estimation methods for mapping genes in genetically heterogeneous admixed populations such as African Americans are lacking. Admixture mapping is a method that traces the ancestral origin of disease-susceptibility genetic loci in the admixed population. We developed AdmixPower, a freely available tool set based on the open-source R software, to perform power and sample size analysis for genetically heterogeneous admixed populations considering continuous or dichotomous outcomes with a case-only or case-control study design. AdmixPower can be used to compute the sample size required to achieve investigator-specified statistical power under several key parameters including ancestry odds ratio, genotype risk ratio, parental risk ratio, an underlying genetic risk model, trait type, and admixture model (hybrid-isolation or continuous gene flow model). We demonstrate that differences in the key parameters in the admixed population results in substantial differences in the sample size required to achieve adequate power in admixture mapping studies. Our tool provides a resource for researchers to develop a strategy to minimize cost and maximize the success of identifying disease-susceptibility loci in an admixed population. R code used in the sample size and power analysis is freely available from https://research.cchmc.org/mershalab/Tools.html.


Assuntos
Mapeamento Cromossômico/métodos , Loci Gênicos , População/genética , Software , Negro ou Afro-Americano/genética , Mapeamento Cromossômico/normas , Frequência do Gene , Genótipo , Humanos , Tamanho da Amostra
14.
Genetics ; 207(2): 447-463, 2017 10.
Artigo em Inglês | MEDLINE | ID: mdl-28827289

RESUMO

Mutants remain a powerful means for dissecting gene function in model organisms such as Caenorhabditis elegans Massively parallel sequencing has simplified the detection of variants after mutagenesis but determining precisely which change is responsible for phenotypic perturbation remains a key step. Genetic mapping paradigms in C. elegans rely on bulk segregant populations produced by crosses with the problematic Hawaiian wild isolate and an excess of redundant information from whole-genome sequencing (WGS). To increase the repertoire of available mutants and to simplify identification of the causal change, we performed WGS on 173 temperature-sensitive (TS) lethal mutants and devised a novel mapping method. The mapping method uses molecular inversion probes (MIP-MAP) in a targeted sequencing approach to genetic mapping, and replaces the Hawaiian strain with a Million Mutation Project strain with high genomic and phenotypic similarity to the laboratory wild-type strain N2 We validated MIP-MAP on a subset of the TS mutants using a competitive selection approach to produce TS candidate mapping intervals with a mean size < 3 Mb. MIP-MAP successfully uses a non-Hawaiian mapping strain and multiplexed libraries are sequenced at a fraction of the cost of WGS mapping approaches. Our mapping results suggest that the collection of TS mutants contains a diverse library of TS alleles for genes essential to development and reproduction. MIP-MAP is a robust method to genetically map mutations in both viable and essential genes and should be adaptable to other organisms. It may also simplify tracking of individual genotypes within population mixtures.


Assuntos
Caenorhabditis elegans/genética , Mapeamento Cromossômico/métodos , Cromossomos/genética , Mutação , Termotolerância/genética , Sequenciamento Completo do Genoma/métodos , Animais , Caenorhabditis elegans/fisiologia , Proteínas de Caenorhabditis elegans/genética , Mapeamento Cromossômico/normas , Sequenciamento Completo do Genoma/normas
15.
PLoS Comput Biol ; 13(8): e1005703, 2017 Aug.
Artigo em Inglês | MEDLINE | ID: mdl-28821014

RESUMO

Mapping gene expression as a quantitative trait using whole genome-sequencing and transcriptome analysis allows to discover the functional consequences of genetic variation. We developed a novel method and ultra-fast software Findr for higly accurate causal inference between gene expression traits using cis-regulatory DNA variations as causal anchors, which improves current methods by taking into consideration hidden confounders and weak regulations. Findr outperformed existing methods on the DREAM5 Systems Genetics challenge and on the prediction of microRNA and transcription factor targets in human lymphoblastoid cells, while being nearly a million times faster. Findr is publicly available at https://github.com/lingfeiwang/findr.


Assuntos
Mapeamento Cromossômico , Sequenciamento de Nucleotídeos em Larga Escala , Transcriptoma/genética , Algoritmos , Mapeamento Cromossômico/métodos , Mapeamento Cromossômico/normas , Bases de Dados Genéticas , Variação Genética , Sequenciamento de Nucleotídeos em Larga Escala/métodos , Sequenciamento de Nucleotídeos em Larga Escala/normas , Modelos Estatísticos
16.
Nat Biotechnol ; 35(7): 676-683, 2017 Jul.
Artigo em Inglês | MEDLINE | ID: mdl-28604660

RESUMO

We present 1,003 reference genomes that were sequenced as part of the Genomic Encyclopedia of Bacteria and Archaea (GEBA) initiative, selected to maximize sequence coverage of phylogenetic space. These genomes double the number of existing type strains and expand their overall phylogenetic diversity by 25%. Comparative analyses with previously available finished and draft genomes reveal a 10.5% increase in novel protein families as a function of phylogenetic diversity. The GEBA genomes recruit 25 million previously unassigned metagenomic proteins from 4,650 samples, improving their phylogenetic and functional interpretation. We identify numerous biosynthetic clusters and experimentally validate a divergent phenazine cluster with potential new chemical structure and antimicrobial activity. This Resource is the largest single release of reference genomes to date. Bacterial and archaeal isolate sequence space is still far from saturated, and future endeavors in this direction will continue to be a valuable resource for scientific discovery.


Assuntos
Mapeamento Cromossômico/normas , Bases de Dados Genéticas , Genoma Arqueal/genética , Genoma Bacteriano/genética , Sequenciamento de Nucleotídeos em Larga Escala/normas , Bases de Conhecimento , Sistemas de Gerenciamento de Base de Dados , Conjuntos de Dados como Assunto , Enciclopédias como Assunto , Valores de Referência
17.
Nat Methods ; 14(6): 587-589, 2017 Jun.
Artigo em Inglês | MEDLINE | ID: mdl-28481363

RESUMO

Model-based molecular phylogenetics plays an important role in comparisons of genomic data, and model selection is a key step in all such analyses. We present ModelFinder, a fast model-selection method that greatly improves the accuracy of phylogenetic estimates by incorporating a model of rate heterogeneity across sites not previously considered in this context and by allowing concurrent searches of model space and tree space.


Assuntos
Algoritmos , Mapeamento Cromossômico/normas , Sequenciamento de Nucleotídeos em Larga Escala/métodos , Modelos Genéticos , Filogenia , Animais , Simulação por Computador , Evolução Molecular , Humanos , Modelos Estatísticos , Reprodutibilidade dos Testes , Sensibilidade e Especificidade , Análise de Sequência de DNA
18.
PLoS One ; 12(4): e0175768, 2017.
Artigo em Inglês | MEDLINE | ID: mdl-28406955

RESUMO

Genome-wide association studies (GWASs) have identified a large number of noncoding associations, calling for systematic mapping to causal regulatory variants and their distal target genes. A widely used method, quantitative trait loci (QTL) mapping for chromatin or expression traits, suffers from sample-to-sample experimental variation and trans-acting or environmental effects. Instead, alleles at heterozygous loci can be compared within a sample, thereby controlling for those confounding factors. Here we introduce a method for chromatin structure-based allele-specific pairing of regulatory variants and target transcripts. With phased genotypes, much of allele-specific expression could be explained by paired allelic cis-regulation across a long range. This approach showed approximately two times greater sensitivity than QTL mapping. There are cases in which allele imbalance cannot be tested because heterozygotes are not available among reference samples. Therefore, we employed a machine learning method to predict missing positive cases based on various features shared by observed allele-specific pairs. We showed that only 10 reference samples are sufficient to achieve high prediction accuracy with a low sampling variation. In conclusion, our method enables highly sensitive fine mapping and target identification for trait-associated variants based on a small number of reference samples.


Assuntos
Cromatina/genética , Mapeamento Cromossômico/normas , Polimorfismo de Nucleotídeo Único , RNA Mensageiro/genética , Alelos , Mapeamento Cromossômico/métodos , Estudo de Associação Genômica Ampla , Humanos , Aprendizado de Máquina , Locos de Características Quantitativas
19.
Genet Med ; 19(7): 809-818, 2017 07.
Artigo em Inglês | MEDLINE | ID: mdl-28079900

RESUMO

PURPOSE: Genomic sequencing (GS) for newborns may enable detection of conditions for which early knowledge can improve health outcomes. One of the major challenges hindering its broader application is the time it takes to assess the clinical relevance of detected variants and the genes they impact so that disease risk is reported appropriately. METHODS: To facilitate rapid interpretation of GS results in newborns, we curated a catalog of genes with putative pediatric relevance for their validity based on the ClinGen clinical validity classification framework criteria, age of onset, penetrance, and mode of inheritance through systematic evaluation of published evidence. Based on these attributes, we classified genes to guide the return of results in the BabySeq Project, a randomized, controlled trial exploring the use of newborn GS (nGS), and used our curated list for the first 15 newborns sequenced in this project. RESULTS: Here, we present our curated list for 1,514 gene-disease associations. Overall, 954 genes met our criteria for return in nGS. This reference list eliminated manual assessment for 41% of rare variants identified in 15 newborns. CONCLUSION: Our list provides a resource that can assist in guiding the interpretive scope of clinical GS for newborns and potentially other populations.Genet Med advance online publication 12 January 2017.


Assuntos
Doenças Genéticas Inatas/diagnóstico , Testes Genéticos/métodos , Sequenciamento de Nucleotídeos em Larga Escala/normas , Sequência de Bases , Mapeamento Cromossômico/normas , Bases de Dados Genéticas , Exoma , Feminino , Predisposição Genética para Doença/genética , Genoma Humano , Genômica/métodos , Sequenciamento de Nucleotídeos em Larga Escala/métodos , Humanos , Recém-Nascido , Masculino
20.
J Am Coll Radiol ; 13(12 Pt A): 1467-1472, 2016 Dec.
Artigo em Inglês | MEDLINE | ID: mdl-27595197

RESUMO

Incidental and secondary findings have become an important by-product of diagnostic testing, and their ramifications affect clinical care, research, and policy. Given parallels in the reporting and management of such findings on diagnostic imaging, radiologists may draw from ongoing discussions in medical genetics to rethink more patient-centered approaches to analogous clinical, ethical, and medicolegal dilemmas. Low-risk incidental findings in particular may be drivers of unnecessary testing, invasive procedures, and overtreatment, with associated financial, psychological, and clinical consequences. As radiologists act in patients' best interests by strengthening standardized guidelines on how each finding merits further diagnostic testing or treatment, perhaps the greatest challenge for producing such guidelines is for low-risk incidental findings, for which adverse consequences are unlikely but associated with substantial uncertainty because of the lack of strong evidence on which to base the recommendations. More uniform recommendations for managing low-risk radiologic incidental findings should therefore aim to provide reasonable options that apply across a spectrum of patient preferences. These will require evaluation through research and will ultimately influence the quality of care. Specific areas for exploration may include (1) better gauging of patient attitudes and preferences regarding low-risk incidental findings, (2) using patient preferences to inform more uniform recommendations for low-risk findings that apply across a spectrum of preferences and help guide shared decision making, and (3) when patients endorse a strong preference not to discover low-risk incidental findings, how it might be possible for professional standards to curtail their generation in specific circumstances.


Assuntos
Mapeamento Cromossômico/normas , Testes Genéticos/normas , Genoma/genética , Achados Incidentais , Assistência Centrada no Paciente/normas , Radiologia/normas , Exoma/genética , Humanos , Guias de Prática Clínica como Assunto , Estados Unidos
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA