Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 16 de 16
Filtrar
Más filtros

Banco de datos
País/Región como asunto
Tipo del documento
Intervalo de año de publicación
1.
Cell ; 185(16): 3041-3055.e25, 2022 08 04.
Artículo en Inglés | MEDLINE | ID: mdl-35917817

RESUMEN

Rare copy-number variants (rCNVs) include deletions and duplications that occur infrequently in the global human population and can confer substantial risk for disease. In this study, we aimed to quantify the properties of haploinsufficiency (i.e., deletion intolerance) and triplosensitivity (i.e., duplication intolerance) throughout the human genome. We harmonized and meta-analyzed rCNVs from nearly one million individuals to construct a genome-wide catalog of dosage sensitivity across 54 disorders, which defined 163 dosage sensitive segments associated with at least one disorder. These segments were typically gene dense and often harbored dominant dosage sensitive driver genes, which we were able to prioritize using statistical fine-mapping. Finally, we designed an ensemble machine-learning model to predict probabilities of dosage sensitivity (pHaplo & pTriplo) for all autosomal genes, which identified 2,987 haploinsufficient and 1,559 triplosensitive genes, including 648 that were uniquely triplosensitive. This dosage sensitivity resource will provide broad utility for human disease research and clinical genetics.


Asunto(s)
Variaciones en el Número de Copia de ADN , Genoma Humano , Variaciones en el Número de Copia de ADN/genética , Dosificación de Gen , Haploinsuficiencia/genética , Humanos
2.
Cell ; 181(6): 1246-1262.e22, 2020 06 11.
Artículo en Inglés | MEDLINE | ID: mdl-32442405

RESUMEN

There is considerable inter-individual variability in susceptibility to weight gain despite an equally obesogenic environment in large parts of the world. Whereas many studies have focused on identifying the genetic susceptibility to obesity, we performed a GWAS on metabolically healthy thin individuals (lowest 6th percentile of the population-wide BMI spectrum) in a uniquely phenotyped Estonian cohort. We discovered anaplastic lymphoma kinase (ALK) as a candidate thinness gene. In Drosophila, RNAi mediated knockdown of Alk led to decreased triglyceride levels. In mice, genetic deletion of Alk resulted in thin animals with marked resistance to diet- and leptin-mutation-induced obesity. Mechanistically, we found that ALK expression in hypothalamic neurons controls energy expenditure via sympathetic control of adipose tissue lipolysis. Our genetic and mechanistic experiments identify ALK as a thinness gene, which is involved in the resistance to weight gain.


Asunto(s)
Quinasa de Linfoma Anaplásico/genética , Delgadez/genética , Tejido Adiposo/metabolismo , Adulto , Animales , Línea Celular , Estudios de Cohortes , Drosophila/genética , Estonia , Femenino , Humanos , Leptina/genética , Lipólisis/genética , Masculino , Ratones , Ratones Endogámicos C57BL , Ratones Noqueados , Obesidad/genética , Interferencia de ARN/fisiología , Adulto Joven
3.
Am J Hum Genet ; 109(4): 647-668, 2022 04 07.
Artículo en Inglés | MEDLINE | ID: mdl-35240056

RESUMEN

The impact of copy-number variations (CNVs) on complex human traits remains understudied. We called CNVs in 331,522 UK Biobank participants and performed genome-wide association studies (GWASs) between the copy number of CNV-proxy probes and 57 continuous traits, revealing 131 signals spanning 47 phenotypes. Our analysis recapitulated well-known associations (e.g., 1q21 and height), revealed the pleiotropy of recurrent CNVs (e.g., 26 and 16 traits for 16p11.2-BP4-BP5 and 22q11.21, respectively), and suggested gene functionalities (e.g., MARF1 in female reproduction). Forty-eight CNV signals (38%) overlapped with single-nucleotide polymorphism (SNP)-GWASs signals for the same trait. For instance, deletion of PDZK1, which encodes a urate transporter scaffold protein, decreased serum urate levels, while deletion of RHD, which encodes the Rhesus blood group D antigen, associated with hematological traits. Other signals overlapped Mendelian disorder regions, suggesting variable expressivity and broad impact of these loci, as illustrated by signals mapping to Rotor syndrome (SLCO1B1/3), renal cysts and diabetes syndrome (HNF1B), or Charcot-Marie-Tooth (PMP22) loci. Total CNV burden negatively impacted 35 traits, leading to increased adiposity, liver/kidney damage, and decreased intelligence and physical capacity. Thirty traits remained burden associated after correcting for CNV-GWAS signals, pointing to a polygenic CNV architecture. The burden negatively correlated with socio-economic indicators, parental lifespan, and age (survivorship proxy), suggesting a contribution to decreased longevity. Together, our results showcase how studying CNVs can expand biological insights, emphasizing the critical role of this mutational class in shaping human traits and arguing in favor of a continuum between Mendelian and complex diseases.


Asunto(s)
Variaciones en el Número de Copia de ADN , Estudio de Asociación del Genoma Completo , Variaciones en el Número de Copia de ADN/genética , Femenino , Humanos , Transportador 1 de Anión Orgánico Específico del Hígado , Herencia Multifactorial , Fenotipo , Polimorfismo de Nucleótido Simple/genética
4.
Am J Hum Genet ; 107(4): 612-621, 2020 10 01.
Artículo en Inglés | MEDLINE | ID: mdl-32888428

RESUMEN

Hypersensitivity reactions to drugs are often unpredictable and can be life threatening, underscoring a need for understanding their underlying mechanisms and risk factors. The extent to which germline genetic variation influences the risk of commonly reported drug allergies such as penicillin allergy remains largely unknown. We extracted data from the electronic health records of more than 600,000 participants from the UK, Estonian, and Vanderbilt University Medical Center's BioVU biobanks to study the role of genetic variation in the occurrence of self-reported penicillin hypersensitivity reactions. We used imputed SNP to HLA typing data from these cohorts to further fine map the human leukocyte antigen (HLA) association and replicated our results in 23andMe's research cohort involving a total of 1.12 million individuals. Genome-wide meta-analysis of penicillin allergy revealed two loci, including one located in the HLA region on chromosome 6. This signal was further fine-mapped to the HLA-B∗55:01 allele (OR 1.41 95% CI 1.33-1.49, p value 2.04 × 10-31) and confirmed by independent replication in 23andMe's research cohort (OR 1.30 95% CI 1.25-1.34, p value 1.00 × 10-47). The lead SNP was also associated with lower lymphocyte counts and in silico follow-up suggests a potential effect on T-lymphocytes at HLA-B∗55:01. We also observed a significant hit in PTPN22 and the GWAS results correlated with the genetics of rheumatoid arthritis and psoriasis. We present robust evidence for the role of an allele of the major histocompatibility complex (MHC) I gene HLA-B in the occurrence of penicillin allergy.


Asunto(s)
Artritis Reumatoide/genética , Hipersensibilidad a las Drogas/genética , Antígenos HLA-B/genética , Polimorfismo de Nucleótido Simple , Proteína Tirosina Fosfatasa no Receptora Tipo 22/genética , Psoriasis/genética , Adulto , Alelos , Artritis Reumatoide/complicaciones , Artritis Reumatoide/inmunología , Cromosomas Humanos Par 6/química , Hipersensibilidad a las Drogas/complicaciones , Hipersensibilidad a las Drogas/etiología , Hipersensibilidad a las Drogas/inmunología , Registros Electrónicos de Salud , Europa (Continente) , Femenino , Expresión Génica , Sitios Genéticos , Predisposición Genética a la Enfermedad , Genoma Humano , Estudio de Asociación del Genoma Completo , Antígenos HLA-B/inmunología , Prueba de Histocompatibilidad , Humanos , Masculino , Penicilinas/efectos adversos , Proteína Tirosina Fosfatasa no Receptora Tipo 22/inmunología , Psoriasis/complicaciones , Psoriasis/inmunología , Autoinforme , Linfocitos T/inmunología , Linfocitos T/patología , Estados Unidos
5.
Bioinformatics ; 34(11): 1937-1938, 2018 06 01.
Artículo en Inglés | MEDLINE | ID: mdl-29360956

RESUMEN

Summary: Designing PCR primers for amplifying regions of eukaryotic genomes is a complicated task because the genomes contain a large number of repeat sequences and other regions unsuitable for amplification by PCR. We have developed a novel k-mer based masking method that uses a statistical model to detect and mask failure-prone regions on the DNA template prior to primer design. We implemented the software as a standalone software primer3_masker and integrated it into the primer design program Primer3. Availability and implementation: The standalone version of primer3_masker is implemented in C. The source code is freely available at https://github.com/bioinfo-ut/primer3_masker/ (standalone version for Linux and macOS) and at https://github.com/primer3-org/primer3/ (integrated version). Primer3 web application that allows masking sequences of 196 animal and plant genomes is available at http://primer3.ut.ee/. Contact: maido.remm@ut.ee. Supplementary information: Supplementary data are available at Bioinformatics online.


Asunto(s)
Cartilla de ADN , Reacción en Cadena de la Polimerasa/métodos , Secuencias Repetitivas de Ácidos Nucleicos , Programas Informáticos , Animales , Humanos , Plantas/genética
6.
Genet Med ; 21(6): 1345-1354, 2019 06.
Artículo en Inglés | MEDLINE | ID: mdl-30327539

RESUMEN

PURPOSE: Biomedical databases combining electronic medical records and phenotypic and genomic data constitute a powerful resource for the personalization of treatment. To leverage the wealth of information provided, algorithms are required that systematically translate the contained information into treatment recommendations based on existing genotype-phenotype associations. METHODS: We developed and tested algorithms for translation of preexisting genotype data of over 44,000 participants of the Estonian biobank into pharmacogenetic recommendations. We compared the results obtained by genome sequencing, exome sequencing, and genotyping using microarrays, and evaluated the impact of pharmacogenetic reporting based on drug prescription statistics in the Nordic countries and Estonia. RESULTS: Our most striking result was that the performance of genotyping arrays is similar to that of genome sequencing, whereas exome sequencing is not suitable for pharmacogenetic predictions. Interestingly, 99.8% of all assessed individuals had a genotype associated with increased risks to at least one medication, and thereby the implementation of pharmacogenetic recommendations based on genotyping affects at least 50 daily drug doses per 1000 inhabitants. CONCLUSION: We find that microarrays are a cost-effective solution for creating preemptive pharmacogenetic reports, and with slight modifications, existing databases can be applied for automated pharmacogenetic decision support for clinicians.


Asunto(s)
Farmacogenética/métodos , Variantes Farmacogenómicas/genética , Análisis de Secuencia de ADN/métodos , Algoritmos , Bancos de Muestras Biológicas , Bases de Datos Factuales , Registros Electrónicos de Salud , Estonia , Pruebas Genéticas/normas , Genotipo , Humanos , Análisis de Secuencia por Matrices de Oligonucleótidos/métodos , Pruebas de Farmacogenómica/métodos , Fenotipo , Medicina de Precisión/métodos
7.
BMC Cancer ; 19(1): 557, 2019 Jun 10.
Artículo en Inglés | MEDLINE | ID: mdl-31182048

RESUMEN

BACKGROUND: Published genetic risk scores for breast cancer (BC) so far have been based on a relatively small number of markers and are not necessarily using the full potential of large-scale Genome-Wide Association Studies. This study aimed to identify an efficient polygenic predictor for BC based on best available evidence and to assess its potential for personalized risk prediction and screening strategies. METHODS: Four different genetic risk scores (two already published and two newly developed) and their combinations (metaGRS) were compared in the subsets of two population-based biobank cohorts: the UK Biobank (UKBB, 3157 BC cases, 43,827 controls) and Estonian Biobank (EstBB, 317 prevalent and 308 incident BC cases in 32,557 women). In addition, correlations between different genetic risk scores and their associations with BC risk factors were studied in both cohorts. RESULTS: The metaGRS that combines two genetic risk scores (metaGRS2 - based on 75 and 898 Single Nucleotide Polymorphisms, respectively) had the strongest association with prevalent BC status in both cohorts. One standard deviation difference in the metaGRS2 corresponded to an Odds Ratio = 1.6 (95% CI 1.54 to 1.66, p = 9.7*10- 135) in the UK Biobank and accounting for family history marginally attenuated the effect (Odds Ratio = 1.58, 95% CI 1.53 to 1.64, p = 7.8*10- 129). In the EstBB cohort, the hazard ratio of incident BC for the women in the top 5% of the metaGRS2 compared to women in the lowest 50% was 4.2 (95% CI 2.8 to 6.2, p = 8.1*10- 13). The different GRSs were only moderately correlated with each other and were associated with different known predictors of BC. The classification of genetic risk for the same individual varied considerably depending on the chosen GRS. CONCLUSIONS: We have shown that metaGRS2, that combined on the effects of more than 900 SNPs, provided best predictive ability for breast cancer in two different population-based cohorts. The strength of the effect of metaGRS2 indicates that the GRS could potentially be used to develop more efficient strategies for breast cancer screening for genotyped women.


Asunto(s)
Biomarcadores de Tumor/genética , Neoplasias de la Mama/genética , Genotipo , Herencia Multifactorial , Adulto , Anciano , Estudios de Casos y Controles , Estudios de Cohortes , Detección Precoz del Cáncer , Femenino , Predisposición Genética a la Enfermedad , Pruebas Genéticas , Estudio de Asociación del Genoma Completo , Humanos , Persona de Mediana Edad , Polimorfismo de Nucleótido Simple , Grupos de Población , Valor Predictivo de las Pruebas , Pronóstico , Riesgo
8.
Nat Genet ; 55(3): 423-436, 2023 03.
Artículo en Inglés | MEDLINE | ID: mdl-36914876

RESUMEN

Endometriosis is a common condition associated with debilitating pelvic pain and infertility. A genome-wide association study meta-analysis, including 60,674 cases and 701,926 controls of European and East Asian descent, identified 42 genome-wide significant loci comprising 49 distinct association signals. Effect sizes were largest for stage 3/4 disease, driven by ovarian endometriosis. Identified signals explained up to 5.01% of disease variance and regulated expression or methylation of genes in endometrium and blood, many of which were associated with pain perception/maintenance (SRP14/BMF, GDAP1, MLLT10, BSN and NGF). We observed significant genetic correlations between endometriosis and 11 pain conditions, including migraine, back and multisite chronic pain (MCP), as well as inflammatory conditions, including asthma and osteoarthritis. Multitrait genetic analyses identified substantial sharing of variants associated with endometriosis and MCP/migraine. Targeted investigations of genetically regulated mechanisms shared between endometriosis and other pain conditions are needed to aid the development of new treatments and facilitate early symptomatic intervention.


Asunto(s)
Endometriosis , Femenino , Humanos , Endometriosis/genética , Endometriosis/metabolismo , Predisposición Genética a la Enfermedad , Estudio de Asociación del Genoma Completo , Dolor , Comorbilidad
9.
Nat Commun ; 13(1): 3584, 2022 06 23.
Artículo en Inglés | MEDLINE | ID: mdl-35739095

RESUMEN

Pelvic organ prolapse is a common gynecological condition with limited understanding of its genetic background. In this work, we perform a genome-wide association meta-analysis comprising 28,086 cases and 546,291 controls from European ancestry. We identify 19 novel genome-wide significant loci, highlighting connective tissue, urogenital and cardiometabolic as likely affected systems. Here, we prioritize many genes of potential interest and assess shared genetic and phenotypic links. Additionally, we present the first polygenic risk score, which shows similar predictive ability (Harrell C-statistic (C-stat) 0.583, standard deviation (sd) = 0.007) as five established clinical risk factors combined (number of children, body mass index, ever smoked, constipation and asthma) (C-stat = 0.588, sd = 0.007) and demonstrates a substantial incremental value in combination with these (C-stat = 0.630, sd = 0.007). These findings improve our understanding of genetic factors underlying pelvic organ prolapse and provide a solid start evaluating polygenic risk scores as a potential tool to enhance individual risk prediction.


Asunto(s)
Estudio de Asociación del Genoma Completo , Prolapso de Órgano Pélvico , Índice de Masa Corporal , Niño , Humanos , Prolapso de Órgano Pélvico/genética , Factores de Riesgo
10.
HGG Adv ; 3(4): 100133, 2022 Oct 13.
Artículo en Inglés | MEDLINE | ID: mdl-36035246

RESUMEN

Copy-number variations (CNV) are believed to play an important role in a wide range of complex traits, but discovering such associations remains challenging. While whole-genome sequencing (WGS) is the gold-standard approach for CNV detection, there are several orders of magnitude more samples with available genotyping microarray data. Such array data can be exploited for CNV detection using dedicated software (e.g., PennCNV); however, these calls suffer from elevated false-positive and -negative rates. In this study, we developed a CNV quality score that weights PennCNV calls (pCNVs) based on their likelihood of being true positive. First, we established a measure of pCNV reliability by leveraging evidence from multiple omics data (WGS, transcriptomics, and methylomics) obtained from the same samples. Next, we built a predictor of omics-confirmed pCNVs, termed omics-informed quality score (OQS), using only PennCNV software output parameters. Promisingly, OQS assigned to pCNVs detected in close family members was up to 35% higher than the OQS of pCNVs not carried by other relatives (p < 3.0 × 10-90), outperforming other scores. Finally, in an association study of four anthropometric traits in 89,516 Estonian Biobank samples, the use of OQS led to a relative increase in the trait variance explained by CNVs of up to 56% compared with published quality filtering methods or scores. Overall, we put forward a flexible framework to improve any CNV detection method leveraging multi-omics evidence, applied it to improve PennCNV calls, and demonstrated its utility by improving the statistical power for downstream association analyses.

11.
Nat Commun ; 12(1): 3761, 2021 06 18.
Artículo en Inglés | MEDLINE | ID: mdl-34145262

RESUMEN

Pernicious anemia is a rare condition characterized by vitamin B12 deficiency anemia due to lack of intrinsic factor, often caused by autoimmune gastritis. Patients with pernicious anemia have a higher incidence of other autoimmune disorders, such as type 1 diabetes, vitiligo, and autoimmune thyroid issues. Therefore, the disease has a clear autoimmune basis, although the genetic susceptibility factors have thus far remained poorly studied. We conduct a genome-wide association study meta-analysis in 2166 cases and 659,516 European controls from population-based biobanks and identify genome-wide significant signals in or near the PTPN22 (rs6679677, p = 1.91 × 10-24, OR = 1.63), PNPT1 (rs12616502, p = 3.14 × 10-8, OR = 1.70), HLA-DQB1 (rs28414666, p = 1.40 × 10-16, OR = 1.38), IL2RA (rs2476491, p = 1.90 × 10-8, OR = 1.22) and AIRE (rs74203920, p = 2.33 × 10-9, OR = 1.83) genes, thus providing robust associations between pernicious anemia and genetic risk factors.


Asunto(s)
Anemia Perniciosa/genética , Enfermedades Autoinmunes/genética , Predisposición Genética a la Enfermedad/genética , Adulto , Anciano , Exorribonucleasas/genética , Femenino , Gastritis/patología , Variación Genética/genética , Estudio de Asociación del Genoma Completo , Cadenas beta de HLA-DQ/genética , Humanos , Subunidad alfa del Receptor de Interleucina-2/genética , Masculino , Persona de Mediana Edad , Proteína Tirosina Fosfatasa no Receptora Tipo 22/genética , Factores de Riesgo , Factores de Transcripción/genética , Proteína AIRE
12.
Nat Commun ; 11(1): 5980, 2020 11 25.
Artículo en Inglés | MEDLINE | ID: mdl-33239672

RESUMEN

Miscarriage is a common, complex trait affecting ~15% of clinically confirmed pregnancies. Here we present the results of large-scale genetic association analyses with 69,054 cases from five different ancestries for sporadic miscarriage, 750 cases of European ancestry for multiple (≥3) consecutive miscarriage, and up to 359,469 female controls. We identify one genome-wide significant association (rs146350366, minor allele frequency (MAF) 1.2%, P = 3.2 × 10-8, odds ratio (OR) = 1.4) for sporadic miscarriage in our European ancestry meta-analysis and three genome-wide significant associations for multiple consecutive miscarriage (rs7859844, MAF = 6.4%, P = 1.3 × 10-8, OR = 1.7; rs143445068, MAF = 0.8%, P = 5.2 × 10-9, OR = 3.4; rs183453668, MAF = 0.5%, P = 2.8 × 10-8, OR = 3.8). We further investigate the genetic architecture of miscarriage with biobank-scale Mendelian randomization, heritability, and genetic correlation analyses. Our results show that miscarriage etiopathogenesis is partly driven by genetic variation potentially related to placental biology, and illustrate the utility of large-scale biobank data for understanding this pregnancy complication.


Asunto(s)
Aborto Habitual/genética , Aborto Espontáneo/genética , Predisposición Genética a la Enfermedad , Placenta/fisiopatología , Aborto Habitual/epidemiología , Aborto Habitual/fisiopatología , Aborto Espontáneo/epidemiología , Aborto Espontáneo/fisiopatología , Adulto , Anciano , Estudios de Casos y Controles , Conjuntos de Datos como Asunto , Femenino , Frecuencia de los Genes , Estudio de Asociación del Genoma Completo , Humanos , Patrón de Herencia , Anamnesis , Persona de Mediana Edad , Polimorfismo de Nucleótido Simple , Embarazo , Población Blanca/genética , Adulto Joven
13.
Nat Commun ; 10(1): 4626, 2019 10 11.
Artículo en Inglés | MEDLINE | ID: mdl-31604923

RESUMEN

Infertility in men and women is a complex genetic trait with shared biological bases between the sexes. Here, we perform a series of rare variant analyses across 73,185 women and men to identify genes that contribute to primary gonadal dysfunction. We report CSMD1, a complement regulatory protein on chromosome 8p23, as a strong candidate locus in both sexes. We show that CSMD1 is enriched at the germ-cell/somatic-cell interface in both male and female gonads. Csmd1-knockout males show increased rates of infertility with significantly increased complement C3 protein deposition in the testes, accompanied by severe histological degeneration. Knockout females show significant reduction in ovarian quality and breeding success, as well as mammary branching impairment. Double knockout of Csmd1 and C3 causes non-additive reduction in breeding success, suggesting that CSMD1 and the complement pathway play an important role in the normal postnatal development of the gonads in both sexes.


Asunto(s)
Infertilidad/genética , Proteínas de la Membrana/genética , Proteínas Supresoras de Tumor/genética , Factores de Edad , Animales , Complemento C3/metabolismo , Femenino , Estudios de Asociación Genética , Genotipo , Humanos , Masculino , Glándulas Mamarias Animales/crecimiento & desarrollo , Glándulas Mamarias Animales/patología , Menopausia/genética , Ratones Noqueados , Mutación , Ovario/patología , Maduración Sexual , Testículo/metabolismo
14.
Sci Rep ; 7(1): 2537, 2017 05 31.
Artículo en Inglés | MEDLINE | ID: mdl-28566690

RESUMEN

We have developed a computational method that counts the frequencies of unique k-mers in FASTQ-formatted genome data and uses this information to infer the genotypes of known variants. FastGT can detect the variants in a 30x genome in less than 1 hour using ordinary low-cost server hardware. The overall concordance with the genotypes of two Illumina "Platinum" genomes is 99.96%, and the concordance with the genotypes of the Illumina HumanOmniExpress is 99.82%. Our method provides k-mer database that can be used for the simultaneous genotyping of approximately 30 million single nucleotide variants (SNVs), including >23,000 SNVs from Y chromosome. The source code of FastGT software is available at GitHub (https://github.com/bioinfo-ut/GenomeTester4/).


Asunto(s)
Algoritmos , Genoma Humano , Polimorfismo de Nucleótido Simple , Análisis de Secuencia de ADN/métodos , Programas Informáticos , Teorema de Bayes , Benchmarking , Genotipo , Secuenciación de Nucleótidos de Alto Rendimiento , Humanos , Reproducibilidad de los Resultados , Análisis de Secuencia de ADN/estadística & datos numéricos
15.
PeerJ ; 5: e3353, 2017.
Artículo en Inglés | MEDLINE | ID: mdl-28533988

RESUMEN

BACKGROUND: Fast, accurate and high-throughput identification of bacterial isolates is in great demand. The present work was conducted to investigate the possibility of identifying isolates from unassembled next-generation sequencing reads using custom-made guide trees. RESULTS: A tool named StrainSeeker was developed that constructs a list of specific k-mers for each node of any given Newick-format tree and enables the identification of bacterial isolates in 1-2 min. It uses a novel algorithm, which analyses the observed and expected fractions of node-specific k-mers to test the presence of each node in the sample. This allows StrainSeeker to determine where the isolate branches off the guide tree and assign it to a clade whereas other tools assign each read to a reference genome. Using a dataset of 100 Escherichia coli isolates, we demonstrate that StrainSeeker can predict the clades of E. coli with 92% accuracy and correct tree branch assignment with 98% accuracy. Twenty-five thousand Illumina HiSeq reads are sufficient for identification of the strain. CONCLUSION: StrainSeeker is a software program that identifies bacterial isolates by assigning them to nodes or leaves of a custom-made guide tree. StrainSeeker's web interface and pre-computed guide trees are available at http://bioinfo.ut.ee/strainseeker. Source code is stored at GitHub: https://github.com/bioinfo-ut/StrainSeeker.

16.
Gigascience ; 4: 58, 2015.
Artículo en Inglés | MEDLINE | ID: mdl-26640690

RESUMEN

BACKGROUND: K-mer-based methods of genome analysis have attracted great interest because they do not require genome assembly and can be performed directly on sequencing reads. Many analysis tasks require one to compare k-mer lists from different sequences to find words that are either unique to a specific sequence or common to many sequences. However, no stand-alone k-mer analysis tool currently allows one to perform these algebraic set operations. FINDINGS: We have developed the GenomeTester4 toolkit, which contains a novel tool GListCompare for performing union, intersection and complement (difference) set operations on k-mer lists. We provide examples of how these general operations can be combined to solve a variety of biological analysis tasks. CONCLUSIONS: GenomeTester4 can be used to simplify k-mer list manipulation for many biological analysis tasks.


Asunto(s)
Genoma , Genómica/métodos , Análisis de Secuencia de ADN/métodos , Programas Informáticos , Animales , Humanos
SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA