Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 91
Filtrar
1.
Nature ; 611(7936): 519-531, 2022 Nov.
Artículo en Inglés | MEDLINE | ID: mdl-36261518

RESUMEN

The current human reference genome, GRCh38, represents over 20 years of effort to generate a high-quality assembly, which has benefitted society1,2. However, it still has many gaps and errors, and does not represent a biological genome as it is a blend of multiple individuals3,4. Recently, a high-quality telomere-to-telomere reference, CHM13, was generated with the latest long-read technologies, but it was derived from a hydatidiform mole cell line with a nearly homozygous genome5. To address these limitations, the Human Pangenome Reference Consortium formed with the goal of creating high-quality, cost-effective, diploid genome assemblies for a pangenome reference that represents human genetic diversity6. Here, in our first scientific report, we determined which combination of current genome sequencing and assembly approaches yield the most complete and accurate diploid genome assembly with minimal manual curation. Approaches that used highly accurate long reads and parent-child data with graph-based haplotype phasing during assembly outperformed those that did not. Developing a combination of the top-performing methods, we generated our first high-quality diploid reference assembly, containing only approximately four gaps per chromosome on average, with most chromosomes within ±1% of the length of CHM13. Nearly 48% of protein-coding genes have non-synonymous amino acid changes between haplotypes, and centromeric regions showed the highest diversity. Our findings serve as a foundation for assembling near-complete diploid human genomes at scale for a pangenome reference to capture global genetic variation from single nucleotides to structural rearrangements.


Asunto(s)
Mapeo Cromosómico , Diploidia , Genoma Humano , Genómica , Humanos , Mapeo Cromosómico/normas , Genoma Humano/genética , Haplotipos/genética , Secuenciación de Nucleótidos de Alto Rendimiento/métodos , Secuenciación de Nucleótidos de Alto Rendimiento/normas , Análisis de Secuencia de ADN/métodos , Análisis de Secuencia de ADN/normas , Estándares de Referencia , Genómica/métodos , Genómica/normas , Cromosomas Humanos/genética , Variación Genética/genética
2.
Genes (Basel) ; 12(12)2021 12 08.
Artículo en Inglés | MEDLINE | ID: mdl-34946907

RESUMEN

In recent years, optical genome mapping (OGM) has developed into a highly promising method of detecting large-scale structural variants in human genomes. It is capable of detecting structural variants considered difficult to detect by other current methods. Hence, it promises to be feasible as a first-line diagnostic tool, permitting insight into a new realm of previously unknown variants. However, due to its novelty, little experience with OGM is available to infer best practices for its application or to clarify which features cannot be detected. In this study, we used the Saphyr system (Bionano Genomics, San Diego, CA, USA), to explore its capabilities in human genetic diagnostics. To this end, we tested 14 DNA samples to confirm a total of 14 different structural or numerical chromosomal variants originally detected by other means, namely, deletions, duplications, inversions, trisomies, and a translocation. Overall, 12 variants could be confirmed; one deletion and one inversion could not. The prerequisites for detection of similar variants were explored by reviewing the OGM data of 54 samples analyzed in our laboratory. Limitations, some owing to the novelty of the method and some inherent to it, were described. Finally, we tested the successful application of OGM in routine diagnostics and described some of the challenges that merit consideration when utilizing OGM as a diagnostic tool.


Asunto(s)
Aberraciones Cromosómicas , Trastornos de los Cromosomas/diagnóstico , Mapeo Cromosómico/métodos , Mapeo Cromosómico/normas , Variaciones en el Número de Copia de ADN , Genoma Humano , Cariotipificación/métodos , Trastornos de los Cromosomas/genética , Femenino , Humanos , Masculino
3.
PLoS Comput Biol ; 17(10): e1008839, 2021 10.
Artículo en Inglés | MEDLINE | ID: mdl-34634030

RESUMEN

Hi-C is a sample preparation method that enables high-throughput sequencing to capture genome-wide spatial interactions between DNA molecules. The technique has been successfully applied to solve challenging problems such as 3D structural analysis of chromatin, scaffolding of large genome assemblies and more recently the accurate resolution of metagenome-assembled genomes (MAGs). Despite continued refinements, however, preparing a Hi-C library remains a complex laboratory protocol. To avoid costly failures and maximise the odds of successful outcomes, diligent quality management is recommended. Current wet-lab methods provide only a crude assay of Hi-C library quality, while key post-sequencing quality indicators used have-thus far-relied upon reference-based read-mapping. When a reference is accessible, this reliance introduces a concern for quality, where an incomplete or inexact reference skews the resulting quality indicators. We propose a new, reference-free approach that infers the total fraction of read-pairs that are a product of proximity ligation. This quantification of Hi-C library quality requires only a modest amount of sequencing data and is independent of other application-specific criteria. The algorithm builds upon the observation that proximity ligation events are likely to create k-mers that would not naturally occur in the sample. Our software tool (qc3C) is to our knowledge the first to implement a reference-free Hi-C QC tool, and also provides reference-based QC, enabling Hi-C to be more easily applied to non-model organisms and environmental samples. We characterise the accuracy of the new algorithm on simulated and real datasets and compare it to reference-based methods.


Asunto(s)
Mapeo Cromosómico , Genómica , Secuenciación de Nucleótidos de Alto Rendimiento , Control de Calidad , Programas Informáticos , Algoritmos , Animales , Mapeo Cromosómico/métodos , Mapeo Cromosómico/normas , ADN/química , ADN/genética , Biblioteca de Genes , Genómica/métodos , Genómica/normas , Secuenciación de Nucleótidos de Alto Rendimiento/métodos , Secuenciación de Nucleótidos de Alto Rendimiento/normas , Humanos , Tortugas
4.
PLoS Comput Biol ; 17(1): e1008678, 2021 01.
Artículo en Inglés | MEDLINE | ID: mdl-33503026

RESUMEN

Mapping of high-throughput sequencing (HTS) reads to a single arbitrary reference genome is a frequently used approach in microbial genomics. However, the choice of a reference may represent a source of errors that may affect subsequent analyses such as the detection of single nucleotide polymorphisms (SNPs) and phylogenetic inference. In this work, we evaluated the effect of reference choice on short-read sequence data from five clinically and epidemiologically relevant bacteria (Klebsiella pneumoniae, Legionella pneumophila, Neisseria gonorrhoeae, Pseudomonas aeruginosa and Serratia marcescens). Publicly available whole-genome assemblies encompassing the genomic diversity of these species were selected as reference sequences, and read alignment statistics, SNP calling, recombination rates, dN/dS ratios, and phylogenetic trees were evaluated depending on the mapping reference. The choice of different reference genomes proved to have an impact on almost all the parameters considered in the five species. In addition, these biases had potential epidemiological implications such as including/excluding isolates of particular clades and the estimation of genetic distances. These findings suggest that the single reference approach might introduce systematic errors during mapping that affect subsequent analyses, particularly for data sets with isolates from genetically diverse backgrounds. In any case, exploring the effects of different references on the final conclusions is highly recommended.


Asunto(s)
Mapeo Cromosómico/métodos , Mapeo Cromosómico/normas , Genómica/métodos , Secuenciación de Nucleótidos de Alto Rendimiento/métodos , Bacterias/clasificación , Bacterias/genética , Genoma Bacteriano/genética , Filogenia , Polimorfismo de Nucleótido Simple/genética , Alineación de Secuencia
5.
Genet Sel Evol ; 52(1): 73, 2020 Dec 14.
Artículo en Inglés | MEDLINE | ID: mdl-33317445

RESUMEN

BACKGROUND: Recombination is a process by which chromosomes are broken and recombine to generate new combinations of alleles, therefore playing a major role in shaping genome variation. Recombination frequencies ([Formula: see text]) between markers are used to construct genetic maps, which have important implications in genomic studies. Here, we report a recombination map for 44,696 autosomal single nucleotide polymorphisms (SNPs) according to the coordinates of the most recent bovine reference assembly. The recombination frequencies were estimated across 876 half-sib families with a minimum number of 39 and maximum number of 4236 progeny, comprising over 367 K genotyped German Holstein animals. RESULTS: Genome-wide, over 8.9 million paternal recombination events were identified by investigating adjacent markers. The recombination map spans 24.43 Morgan (M) for a chromosomal length of 2486 Mbp and an average of ~ 0.98 cM/Mbp, which concords with the available pedigree-based linkage maps. Furthermore, we identified 971 putative recombination hotspot intervals (defined as [Formula: see text] > 2.5 standard deviations greater than the mean). The hotspot regions were non-uniformly distributed as sharp and narrow peaks, corresponding to ~ 5.8% of the recombination that has taken place in only ~ 2.4% of the genome. We verified genetic map length by applying a likelihood-based approach for the estimation of recombination rate between all intra-chromosomal marker pairs. This resulted in a longer autosomal genetic length for male cattle (25.35 cM) and in the localization of 51 putatively misplaced SNPs in the genome assembly. CONCLUSIONS: Given the fact that this map is built on the coordinates of the ARS-UCD1.2 assembly, our results provide the most updated genetic map yet available for the cattle genome.


Asunto(s)
Bovinos/genética , Mapeo Cromosómico/métodos , Cromosomas/genética , Recombinación Genética , Animales , Mapeo Cromosómico/normas , Ligamiento Genético , Linaje , Polimorfismo de Nucleótido Simple , Estándares de Referencia
6.
Nat Commun ; 11(1): 5040, 2020 10 07.
Artículo en Inglés | MEDLINE | ID: mdl-33028839

RESUMEN

Bringing together cancer genomes from different projects increases power and allows the investigation of pan-cancer, molecular mechanisms. However, working with whole genomes sequenced over several years in different sequencing centres requires a framework to compare the quality of these sequences. We used the Pan-Cancer Analysis of Whole Genomes cohort as a test case to construct such a framework. This cohort contains whole cancer genomes of 2832 donors from 18 sequencing centres. We developed a non-redundant set of five quality control (QC) measurements to establish a star rating system. These QC measures reflect known differences in sequencing protocol and provide a guide to downstream analyses and allow for exclusion of samples of poor quality. We have found that this is an effective framework of quality measures. The implementation of the framework is available at: https://dockstore.org/containers/quay.io/jwerner_dkfz/pancanqc:1.2.2 .


Asunto(s)
Genoma Humano/genética , Genómica/normas , Neoplasias/genética , Control de Calidad , Mapeo Cromosómico/normas , Cromosomas Humanos/genética , Análisis Mutacional de ADN/normas , Femenino , Genómica/métodos , Secuenciación de Nucleótidos de Alto Rendimiento/normas , Humanos , Masculino , Mutación , Programas Informáticos , Secuenciación Completa del Genoma/normas
7.
Arch Med Sadowej Kryminol ; 70(1): 1-18, 2020.
Artículo en Inglés | MEDLINE | ID: mdl-32876419

RESUMEN

Y chromosome typing has been performed in forensic genetic practice for more than 20 years. The latest recommendations of the DNA Commission of the International Society of Forensic Genetics (ISFG) concerning the application of Y-chromosomal markers in forensic genetics were published in 2006. The aim of this report is to recapitulate, systematise and supplement existing recommendations on the forensic analysis of polymorphism of the Y chromosome with standards already implemented in practice, new capabilities linked to the development of research techniques as well as current solutions used in statistical analysis. The recommendations have been adapted specifically to aspects related to the preparation of expert opinions in the field of forensic genetics in Poland. The Polish Speaking Working Group of the ISFG believes that the presented guidelines should become a standard implemented by all Polish laboratories performing Y chromosome typing for forensic purposes.


Asunto(s)
Cromosomas Humanos Y , Dermatoglifia del ADN/normas , Genética Forense/normas , Polimorfismo Genético , Mapeo Cromosómico/normas , Testimonio de Experto/normas , Guías como Asunto , Humanos , Polonia , Sociedades Científicas/normas
8.
JMIR Public Health Surveill ; 6(2): e15917, 2020 04 30.
Artículo en Inglés | MEDLINE | ID: mdl-32352389

RESUMEN

BACKGROUND: Many public health departments use record linkage between surveillance data and external data sources to inform public health interventions. However, little guidance is available to inform these activities, and many health departments rely on deterministic algorithms that may miss many true matches. In the context of public health action, these missed matches lead to missed opportunities to deliver interventions and may exacerbate existing health inequities. OBJECTIVE: This study aimed to compare the performance of record linkage algorithms commonly used in public health practice. METHODS: We compared five deterministic (exact, Stenger, Ocampo 1, Ocampo 2, and Bosh) and two probabilistic record linkage algorithms (fastLink and beta record linkage [BRL]) using simulations and a real-world scenario. We simulated pairs of datasets with varying numbers of errors per record and the number of matching records between the two datasets (ie, overlap). We matched the datasets using each algorithm and calculated their recall (ie, sensitivity, the proportion of true matches identified by the algorithm) and precision (ie, positive predictive value, the proportion of matches identified by the algorithm that were true matches). We estimated the average computation time by performing a match with each algorithm 20 times while varying the size of the datasets being matched. In a real-world scenario, HIV and sexually transmitted disease surveillance data from King County, Washington, were matched to identify people living with HIV who had a syphilis diagnosis in 2017. We calculated the recall and precision of each algorithm compared with a composite standard based on the agreement in matching decisions across all the algorithms and manual review. RESULTS: In simulations, BRL and fastLink maintained a high recall at nearly all data quality levels, while being comparable with deterministic algorithms in terms of precision. Deterministic algorithms typically failed to identify matches in scenarios with low data quality. All the deterministic algorithms had a shorter average computation time than the probabilistic algorithms. BRL had the slowest overall computation time (14 min when both datasets contained 2000 records). In the real-world scenario, BRL had the lowest trade-off between recall (309/309, 100.0%) and precision (309/312, 99.0%). CONCLUSIONS: Probabilistic record linkage algorithms maximize the number of true matches identified, reducing gaps in the coverage of interventions and maximizing the reach of public health action.


Asunto(s)
Algoritmos , COVID-19/diagnóstico , Mapeo Cromosómico/normas , Registros Electrónicos de Salud/instrumentación , Salud Pública/instrumentación , COVID-19/fisiopatología , Mapeo Cromosómico/métodos , Mapeo Cromosómico/estadística & datos numéricos , Registros Electrónicos de Salud/normas , Registros Electrónicos de Salud/tendencias , Humanos , Pandemias/prevención & control , Salud Pública/métodos , Salud Pública/tendencias , Reproducibilidad de los Resultados , Estudios de Validación como Asunto
9.
Int J Parasitol ; 49(11): 847-858, 2019 10.
Artículo en Inglés | MEDLINE | ID: mdl-31525371

RESUMEN

Differential expression analysis between parasitic nematode strains is commonly used to implicate candidate genes in anthelmintic resistance or other biological functions. We have tested the hypothesis that the high genetic diversity of an organism such as Haemonchus contortus could complicate such analyses. First, we investigated the extent to which sequence polymorphism affects the reliability of differential expression analysis between the genetically divergent H. contortus strains MHco3(ISE), MHco4(WRS) and MHco10(CAVR). Using triplicates of 20 adult female worms from each population isolated under parallel experimental conditions, we found that high rates of sequence polymorphism in RNAseq reads were associated with lower efficiency read mapping to gene models under default TopHat2 parameters, leading to biased estimates of inter-strain differential expression. We then showed it is possible to largely compensate for this bias by optimising the read mapping single nucleotide polymorphism (SNP) allowance and filtering out genes with particularly high single nucleotide polymorphism rates. Once the sequence polymorphism biases were removed, we then assessed the genuine transcriptional diversity between the strains, finding ≥824 differentially expressed genes across all three pairwise strain comparisons. This high level of inter-strain transcriptional diversity not only suggests substantive inter-strain phenotypic variation but also highlights the difficulty in reliably associating differential expression of specific genes with phenotypic differences. To provide a practical example, we analysed two gene families of potential relevance to ivermectin drug resistance; the ABC transporters and the ligand-gated ion channels (LGICs). Over half of genes identified as differentially expressed using default TopHat2 parameters were shown to be an artifact of sequence polymorphism differences. This work illustrates the need to account for sequence polymorphism in differential expression analysis. It also demonstrates that a large number of genuine transcriptional differences can occur between H. contortus strains and these must be considered before associating the differential expression of specific genes with phenotypic differences between strains.


Asunto(s)
Perfilación de la Expresión Génica/métodos , Perfilación de la Expresión Génica/normas , Variación Genética , Haemonchus/genética , Animales , Antihelmínticos/farmacología , Mapeo Cromosómico/métodos , Mapeo Cromosómico/normas , Biología Computacional/métodos , Biología Computacional/normas , Resistencia a Medicamentos , Haemonchus/efectos de los fármacos , Ivermectina/farmacología , Análisis de Secuencia de ARN/métodos , Análisis de Secuencia de ARN/normas
10.
Chromosome Res ; 26(4): 297-306, 2018 12.
Artículo en Inglés | MEDLINE | ID: mdl-30225548

RESUMEN

The chicken genome was the third vertebrate to be sequenced. To date, its sequence and feature annotations are used as the reference for avian models in genome sequencing projects developed on birds and other Sauropsida species, and in genetic studies of domesticated birds of economic and evolutionary biology interest. Therefore, an accurate description of this genome model is important to a wide number of scientists. Here, we review the location and features of a very basic element, the centromeres of chromosomes in the galGal5 genome model. Centromeres are elements that are not determined by their DNA sequence but by their epigenetic status, in particular by the accumulation of the histone-like protein CENP-A. Comparison of data from several public sources (primarily marker probes flanking centromeres using fluorescent in situ hybridization done on giant lampbrush chromosomes and CENP-A ChIP-seq datasets) with galGal5 annotations revealed that centromeres are likely inappropriately mapped in 9 of the 16 galGal5 chromosome models in which they are described. Analysis of karyology data confirmed that the location of the main CENP-A peaks in chromosomes is the best means of locating the centromeres in 25 galGal5 chromosome models, the majority of which (16) are fully sequenced and assembled. This data re-analysis reaffirms that several sources of information should be examined to produce accurate genome annotations, particularly for basic structures such as centromeres that are epigenetically determined.


Asunto(s)
Proteína A Centromérica/metabolismo , Centrómero/ultraestructura , Pollos/genética , Genoma/genética , Animales , Proteínas Cromosómicas no Histona , Mapeo Cromosómico/normas , Epigenómica
11.
J Gen Intern Med ; 33(6): 877-885, 2018 06.
Artículo en Inglés | MEDLINE | ID: mdl-29374360

RESUMEN

BACKGROUND: Genomics will play an increasingly prominent role in clinical medicine. OBJECTIVE: To describe how primary care physicians (PCPs) discuss and make clinical recommendations about genome sequencing results. DESIGN: Qualitative analysis. PARTICIPANTS: PCPs and their generally healthy patients undergoing genome sequencing. APPROACH: Patients received clinical genome reports that included four categories of results: monogenic disease risk variants (if present), carrier status, five pharmacogenetics results, and polygenic risk estimates for eight cardiometabolic traits. Patients' office visits with their PCPs were audio-recorded, and summative content analysis was used to describe how PCPs discussed genomic results. KEY RESULTS: For each genomic result discussed in 48 PCP-patient visits, we identified a "take-home" message (recommendation), categorized as continuing current management, further treatment, further evaluation, behavior change, remembering for future care, or sharing with family members. We analyzed how PCPs came to each recommendation by identifying 1) how they described the risk or importance of the given result and 2) the rationale they gave for translating that risk into a specific recommendation. Quantitative analysis showed that continuing current management was the most commonly coded recommendation across results overall (492/749, 66%) and for each individual result type except monogenic disease risk results. Pharmacogenetics was the most common result type to prompt a recommendation to remember for future care (94/119, 79%); carrier status was the most common type prompting a recommendation to share with family members (45/54, 83%); and polygenic results were the most common type prompting a behavior change recommendation (55/58, 95%). One-fifth of recommendation codes associated with monogenic results were for further evaluation (6/24, 25%). Rationales for these recommendations included patient context, family context, and scientific/clinical limitations of sequencing. CONCLUSIONS: PCPs distinguish substantive differences among categories of genome sequencing results and use clinical judgment to justify continuing current management in generally healthy patients with genomic results.


Asunto(s)
Actitud del Personal de Salud , Toma de Decisiones Clínicas , Pruebas Genéticas/normas , Relaciones Médico-Paciente , Médicos de Atención Primaria/normas , Atención Primaria de Salud/normas , Adulto , Mapeo Cromosómico/métodos , Mapeo Cromosómico/normas , Toma de Decisiones Clínicas/métodos , Femenino , Pruebas Genéticas/métodos , Humanos , Masculino , Médicos de Atención Primaria/psicología , Proyectos Piloto , Atención Primaria de Salud/métodos , Factores de Riesgo
12.
Genetics ; 207(3): 873-882, 2017 11.
Artículo en Inglés | MEDLINE | ID: mdl-28951529

RESUMEN

Admixed populations result from recent admixture of two or more ancestral populations with divergent allele frequencies. The genome of each admixed individual is a mosaic of haplotypes inherited from the ancestral populations. Despite the substantial work to assess power and sample size requirements for association mapping in genetically homogeneous populations of European ancestry, power and sample size estimation methods for mapping genes in genetically heterogeneous admixed populations such as African Americans are lacking. Admixture mapping is a method that traces the ancestral origin of disease-susceptibility genetic loci in the admixed population. We developed AdmixPower, a freely available tool set based on the open-source R software, to perform power and sample size analysis for genetically heterogeneous admixed populations considering continuous or dichotomous outcomes with a case-only or case-control study design. AdmixPower can be used to compute the sample size required to achieve investigator-specified statistical power under several key parameters including ancestry odds ratio, genotype risk ratio, parental risk ratio, an underlying genetic risk model, trait type, and admixture model (hybrid-isolation or continuous gene flow model). We demonstrate that differences in the key parameters in the admixed population results in substantial differences in the sample size required to achieve adequate power in admixture mapping studies. Our tool provides a resource for researchers to develop a strategy to minimize cost and maximize the success of identifying disease-susceptibility loci in an admixed population. R code used in the sample size and power analysis is freely available from https://research.cchmc.org/mershalab/Tools.html.


Asunto(s)
Mapeo Cromosómico/métodos , Sitios Genéticos , Población/genética , Programas Informáticos , Negro o Afroamericano/genética , Mapeo Cromosómico/normas , Frecuencia de los Genes , Genotipo , Humanos , Tamaño de la Muestra
13.
ACS Synth Biol ; 6(9): 1609-1613, 2017 09 15.
Artículo en Inglés | MEDLINE | ID: mdl-28911233

RESUMEN

CRISPR/Cas9 system has accelerated research across many fields since its demonstration for genome editing. CRISPR also offers vast therapeutic potential, but an important hurdle of this technology is the off-target mutations it can induce. In this viewpoint, we will discuss recent strategies for improving CRISPR specificity, emphasizing how a complete mechanistic understanding of CRISPR/Cas9 can benefit such efforts. We also propose that agreeing upon a consensus protocol with the highest specificity could benefit researchers working on CRISPR-based therapies. In addition to improving CRISPR/Cas9 specificity, accurate detection of off-target events is also crucial, and we will discuss various unbiased off-target detection methods in terms of their advantages and disadvantages. We suggest that using a combination of cell-based and cell-free methods can prove more useful. In addition, we point out that improving predictive algorithms for off-target sites would require pooling of the available off-target analysis data and standardization of the protocols used for obtaining the data. Moreover, we highlight the risk of insertional mutagenesis for gene correction applications requiring the use of donor DNA. We conclude by discussing future prospects for the field, as well as steps that can be taken to overcome the aforementioned challenges.


Asunto(s)
Sistemas CRISPR-Cas/genética , Mapeo Cromosómico/normas , Edición Génica/normas , Marcación de Gen/métodos , Marcación de Gen/normas , Ingeniería Genética/normas , Mapeo Cromosómico/métodos , Edición Génica/métodos , Ingeniería Genética/métodos , Guías como Asunto , Reproducibilidad de los Resultados , Sensibilidad y Especificidad
14.
PLoS Comput Biol ; 13(8): e1005703, 2017 Aug.
Artículo en Inglés | MEDLINE | ID: mdl-28821014

RESUMEN

Mapping gene expression as a quantitative trait using whole genome-sequencing and transcriptome analysis allows to discover the functional consequences of genetic variation. We developed a novel method and ultra-fast software Findr for higly accurate causal inference between gene expression traits using cis-regulatory DNA variations as causal anchors, which improves current methods by taking into consideration hidden confounders and weak regulations. Findr outperformed existing methods on the DREAM5 Systems Genetics challenge and on the prediction of microRNA and transcription factor targets in human lymphoblastoid cells, while being nearly a million times faster. Findr is publicly available at https://github.com/lingfeiwang/findr.


Asunto(s)
Mapeo Cromosómico , Secuenciación de Nucleótidos de Alto Rendimiento , Transcriptoma/genética , Algoritmos , Mapeo Cromosómico/métodos , Mapeo Cromosómico/normas , Bases de Datos Genéticas , Variación Genética , Secuenciación de Nucleótidos de Alto Rendimiento/métodos , Secuenciación de Nucleótidos de Alto Rendimiento/normas , Modelos Estadísticos
15.
Genetics ; 207(2): 447-463, 2017 10.
Artículo en Inglés | MEDLINE | ID: mdl-28827289

RESUMEN

Mutants remain a powerful means for dissecting gene function in model organisms such as Caenorhabditis elegans Massively parallel sequencing has simplified the detection of variants after mutagenesis but determining precisely which change is responsible for phenotypic perturbation remains a key step. Genetic mapping paradigms in C. elegans rely on bulk segregant populations produced by crosses with the problematic Hawaiian wild isolate and an excess of redundant information from whole-genome sequencing (WGS). To increase the repertoire of available mutants and to simplify identification of the causal change, we performed WGS on 173 temperature-sensitive (TS) lethal mutants and devised a novel mapping method. The mapping method uses molecular inversion probes (MIP-MAP) in a targeted sequencing approach to genetic mapping, and replaces the Hawaiian strain with a Million Mutation Project strain with high genomic and phenotypic similarity to the laboratory wild-type strain N2 We validated MIP-MAP on a subset of the TS mutants using a competitive selection approach to produce TS candidate mapping intervals with a mean size < 3 Mb. MIP-MAP successfully uses a non-Hawaiian mapping strain and multiplexed libraries are sequenced at a fraction of the cost of WGS mapping approaches. Our mapping results suggest that the collection of TS mutants contains a diverse library of TS alleles for genes essential to development and reproduction. MIP-MAP is a robust method to genetically map mutations in both viable and essential genes and should be adaptable to other organisms. It may also simplify tracking of individual genotypes within population mixtures.


Asunto(s)
Caenorhabditis elegans/genética , Mapeo Cromosómico/métodos , Cromosomas/genética , Mutación , Termotolerancia/genética , Secuenciación Completa del Genoma/métodos , Animales , Caenorhabditis elegans/fisiología , Proteínas de Caenorhabditis elegans/genética , Mapeo Cromosómico/normas , Secuenciación Completa del Genoma/normas
16.
Nat Biotechnol ; 35(7): 676-683, 2017 Jul.
Artículo en Inglés | MEDLINE | ID: mdl-28604660

RESUMEN

We present 1,003 reference genomes that were sequenced as part of the Genomic Encyclopedia of Bacteria and Archaea (GEBA) initiative, selected to maximize sequence coverage of phylogenetic space. These genomes double the number of existing type strains and expand their overall phylogenetic diversity by 25%. Comparative analyses with previously available finished and draft genomes reveal a 10.5% increase in novel protein families as a function of phylogenetic diversity. The GEBA genomes recruit 25 million previously unassigned metagenomic proteins from 4,650 samples, improving their phylogenetic and functional interpretation. We identify numerous biosynthetic clusters and experimentally validate a divergent phenazine cluster with potential new chemical structure and antimicrobial activity. This Resource is the largest single release of reference genomes to date. Bacterial and archaeal isolate sequence space is still far from saturated, and future endeavors in this direction will continue to be a valuable resource for scientific discovery.


Asunto(s)
Mapeo Cromosómico/normas , Bases de Datos Genéticas , Genoma Arqueal/genética , Genoma Bacteriano/genética , Secuenciación de Nucleótidos de Alto Rendimiento/normas , Bases del Conocimiento , Sistemas de Administración de Bases de Datos , Conjuntos de Datos como Asunto , Enciclopedias como Asunto , Valores de Referencia
17.
Nat Methods ; 14(6): 587-589, 2017 Jun.
Artículo en Inglés | MEDLINE | ID: mdl-28481363

RESUMEN

Model-based molecular phylogenetics plays an important role in comparisons of genomic data, and model selection is a key step in all such analyses. We present ModelFinder, a fast model-selection method that greatly improves the accuracy of phylogenetic estimates by incorporating a model of rate heterogeneity across sites not previously considered in this context and by allowing concurrent searches of model space and tree space.


Asunto(s)
Algoritmos , Mapeo Cromosómico/normas , Secuenciación de Nucleótidos de Alto Rendimiento/métodos , Modelos Genéticos , Filogenia , Animales , Simulación por Computador , Evolución Molecular , Humanos , Modelos Estadísticos , Reproducibilidad de los Resultados , Sensibilidad y Especificidad , Análisis de Secuencia de ADN
18.
PLoS One ; 12(4): e0175768, 2017.
Artículo en Inglés | MEDLINE | ID: mdl-28406955

RESUMEN

Genome-wide association studies (GWASs) have identified a large number of noncoding associations, calling for systematic mapping to causal regulatory variants and their distal target genes. A widely used method, quantitative trait loci (QTL) mapping for chromatin or expression traits, suffers from sample-to-sample experimental variation and trans-acting or environmental effects. Instead, alleles at heterozygous loci can be compared within a sample, thereby controlling for those confounding factors. Here we introduce a method for chromatin structure-based allele-specific pairing of regulatory variants and target transcripts. With phased genotypes, much of allele-specific expression could be explained by paired allelic cis-regulation across a long range. This approach showed approximately two times greater sensitivity than QTL mapping. There are cases in which allele imbalance cannot be tested because heterozygotes are not available among reference samples. Therefore, we employed a machine learning method to predict missing positive cases based on various features shared by observed allele-specific pairs. We showed that only 10 reference samples are sufficient to achieve high prediction accuracy with a low sampling variation. In conclusion, our method enables highly sensitive fine mapping and target identification for trait-associated variants based on a small number of reference samples.


Asunto(s)
Cromatina/genética , Mapeo Cromosómico/normas , Polimorfismo de Nucleótido Simple , ARN Mensajero/genética , Alelos , Mapeo Cromosómico/métodos , Estudio de Asociación del Genoma Completo , Humanos , Aprendizaje Automático , Sitios de Carácter Cuantitativo
19.
Genet Med ; 19(7): 809-818, 2017 07.
Artículo en Inglés | MEDLINE | ID: mdl-28079900

RESUMEN

PURPOSE: Genomic sequencing (GS) for newborns may enable detection of conditions for which early knowledge can improve health outcomes. One of the major challenges hindering its broader application is the time it takes to assess the clinical relevance of detected variants and the genes they impact so that disease risk is reported appropriately. METHODS: To facilitate rapid interpretation of GS results in newborns, we curated a catalog of genes with putative pediatric relevance for their validity based on the ClinGen clinical validity classification framework criteria, age of onset, penetrance, and mode of inheritance through systematic evaluation of published evidence. Based on these attributes, we classified genes to guide the return of results in the BabySeq Project, a randomized, controlled trial exploring the use of newborn GS (nGS), and used our curated list for the first 15 newborns sequenced in this project. RESULTS: Here, we present our curated list for 1,514 gene-disease associations. Overall, 954 genes met our criteria for return in nGS. This reference list eliminated manual assessment for 41% of rare variants identified in 15 newborns. CONCLUSION: Our list provides a resource that can assist in guiding the interpretive scope of clinical GS for newborns and potentially other populations.Genet Med advance online publication 12 January 2017.


Asunto(s)
Enfermedades Genéticas Congénitas/diagnóstico , Pruebas Genéticas/métodos , Secuenciación de Nucleótidos de Alto Rendimiento/normas , Secuencia de Bases , Mapeo Cromosómico/normas , Bases de Datos Genéticas , Exoma , Femenino , Predisposición Genética a la Enfermedad/genética , Genoma Humano , Genómica/métodos , Secuenciación de Nucleótidos de Alto Rendimiento/métodos , Humanos , Recién Nacido , Masculino
20.
J Am Coll Radiol ; 13(12 Pt A): 1467-1472, 2016 Dec.
Artículo en Inglés | MEDLINE | ID: mdl-27595197

RESUMEN

Incidental and secondary findings have become an important by-product of diagnostic testing, and their ramifications affect clinical care, research, and policy. Given parallels in the reporting and management of such findings on diagnostic imaging, radiologists may draw from ongoing discussions in medical genetics to rethink more patient-centered approaches to analogous clinical, ethical, and medicolegal dilemmas. Low-risk incidental findings in particular may be drivers of unnecessary testing, invasive procedures, and overtreatment, with associated financial, psychological, and clinical consequences. As radiologists act in patients' best interests by strengthening standardized guidelines on how each finding merits further diagnostic testing or treatment, perhaps the greatest challenge for producing such guidelines is for low-risk incidental findings, for which adverse consequences are unlikely but associated with substantial uncertainty because of the lack of strong evidence on which to base the recommendations. More uniform recommendations for managing low-risk radiologic incidental findings should therefore aim to provide reasonable options that apply across a spectrum of patient preferences. These will require evaluation through research and will ultimately influence the quality of care. Specific areas for exploration may include (1) better gauging of patient attitudes and preferences regarding low-risk incidental findings, (2) using patient preferences to inform more uniform recommendations for low-risk findings that apply across a spectrum of preferences and help guide shared decision making, and (3) when patients endorse a strong preference not to discover low-risk incidental findings, how it might be possible for professional standards to curtail their generation in specific circumstances.


Asunto(s)
Mapeo Cromosómico/normas , Pruebas Genéticas/normas , Genoma/genética , Hallazgos Incidentales , Atención Dirigida al Paciente/normas , Radiología/normas , Exoma/genética , Humanos , Guías de Práctica Clínica como Asunto , Estados Unidos
SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA