Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 33
Filtrar
Más filtros

Banco de datos
País/Región como asunto
Tipo del documento
Intervalo de año de publicación
1.
Gut ; 67(7): 1306-1316, 2018 07.
Artículo en Inglés | MEDLINE | ID: mdl-28754778

RESUMEN

BACKGROUND: Most patients with path_MMR gene variants (Lynch syndrome (LS)) now survive both their first and subsequent cancers, resulting in a growing number of older patients with LS for whom limited information exists with respect to cancer risk and survival. OBJECTIVE AND DESIGN: This observational, international, multicentre study aimed to determine prospectively observed incidences of cancers and survival in path_MMR carriers up to 75 years of age. RESULTS: 3119 patients were followed for a total of 24 475 years. Cumulative incidences at 75 years (risks) for colorectal cancer were 46%, 43% and 15% in path_MLH1, path_MSH2 and path_MSH6 carriers; for endometrial cancer 43%, 57% and 46%; for ovarian cancer 10%, 17% and 13%; for upper gastrointestinal (gastric, duodenal, bile duct or pancreatic) cancers 21%, 10% and 7%; for urinary tract cancers 8%, 25% and 11%; for prostate cancer 17%, 32% and 18%; and for brain tumours 1%, 5% and 1%, respectively. Ovarian cancer occurred mainly premenopausally. By contrast, upper gastrointestinal, urinary tract and prostate cancers occurred predominantly at older ages. Overall 5-year survival for prostate cancer was 100%, urinary bladder 93%, ureter 85%, duodenum 67%, stomach 61%, bile duct 29%, brain 22% and pancreas 0%. Path_PMS2 carriers had lower risk for cancer. CONCLUSION: Carriers of different path_MMR variants exhibit distinct patterns of cancer risk and survival as they age. Risk estimates for counselling and planning of surveillance and treatment should be tailored to each patient's age, gender and path_MMR variant. We have updated our open-access website www.lscarisk.org to facilitate this.


Asunto(s)
Neoplasias del Colon/epidemiología , Neoplasias Colorrectales Hereditarias sin Poliposis/complicaciones , Neoplasias Colorrectales Hereditarias sin Poliposis/mortalidad , Neoplasias Pancreáticas/epidemiología , Neoplasias Urogenitales/epidemiología , Factores de Edad , Anciano , Neoplasias Colorrectales Hereditarias sin Poliposis/patología , Bases de Datos Factuales , Femenino , Humanos , Incidencia , Masculino , Estudios Prospectivos
2.
Gut ; 66(3): 464-472, 2017 03.
Artículo en Inglés | MEDLINE | ID: mdl-26657901

RESUMEN

OBJECTIVE: Estimates of cancer risk and the effects of surveillance in Lynch syndrome have been subject to bias, partly through reliance on retrospective studies. We sought to establish more robust estimates in patients undergoing prospective cancer surveillance. DESIGN: We undertook a multicentre study of patients carrying Lynch syndrome-associated mutations affecting MLH1, MSH2, MSH6 or PMS2. Standardised information on surveillance, cancers and outcomes were collated in an Oracle relational database and analysed by age, sex and mutated gene. RESULTS: 1942 mutation carriers without previous cancer had follow-up including colonoscopic surveillance for 13 782 observation years. 314 patients developed cancer, mostly colorectal (n=151), endometrial (n=72) and ovarian (n=19). Cancers were detected from 25 years onwards in MLH1 and MSH2 mutation carriers, and from about 40 years in MSH6 and PMS2 carriers. Among first cancer detected in each patient the colorectal cancer cumulative incidences at 70 years by gene were 46%, 35%, 20% and 10% for MLH1, MSH2, MSH6 and PMS2 mutation carriers, respectively. The equivalent cumulative incidences for endometrial cancer were 34%, 51%, 49% and 24%; and for ovarian cancer 11%, 15%, 0% and 0%. Ten-year crude survival was 87% after any cancer, 91% if the first cancer was colorectal, 98% if endometrial and 89% if ovarian. CONCLUSIONS: The four Lynch syndrome-associated genes had different penetrance and expression. Colorectal cancer occurred frequently despite colonoscopic surveillance but resulted in few deaths. Using our data, a website has been established at http://LScarisk.org enabling calculation of cumulative cancer risks as an aid to genetic counselling in Lynch syndrome.


Asunto(s)
Neoplasias Colorrectales Hereditarias sin Poliposis/epidemiología , Neoplasias Colorrectales Hereditarias sin Poliposis/genética , Neoplasias Endometriales/epidemiología , Neoplasias Ováricas/epidemiología , Vigilancia de la Población , Adolescente , Adulto , Factores de Edad , Anciano , Anciano de 80 o más Años , Niño , Colonoscopía , Neoplasias Colorrectales Hereditarias sin Poliposis/diagnóstico por imagen , Neoplasias Colorrectales Hereditarias sin Poliposis/mortalidad , Proteínas de Unión al ADN/genética , Bases de Datos Factuales , Neoplasias Endometriales/mortalidad , Femenino , Expresión Génica , Heterocigoto , Humanos , Incidencia , Masculino , Persona de Mediana Edad , Endonucleasa PMS2 de Reparación del Emparejamiento Incorrecto/genética , Homólogo 1 de la Proteína MutL/genética , Proteína 2 Homóloga a MutS/genética , Neoplasias Ováricas/mortalidad , Estudios Prospectivos , Tasa de Supervivencia , Adulto Joven
3.
Gut ; 66(9): 1657-1664, 2017 09.
Artículo en Inglés | MEDLINE | ID: mdl-27261338

RESUMEN

OBJECTIVE: Today most patients with Lynch syndrome (LS) survive their first cancer. There is limited information on the incidences and outcome of subsequent cancers. The present study addresses three questions: (i) what is the cumulative incidence of a subsequent cancer; (ii) in which organs do subsequent cancers occur; and (iii) what is the survival following these cancers? DESIGN: Information was collated on prospectively organised surveillance and prospectively observed outcomes in patients with LS who had cancer prior to inclusion and analysed by age, gender and genetic variants. RESULTS: 1273 patients with LS from 10 countries were followed up for 7753 observation years. 318 patients (25.7%) developed 341 first subsequent cancers, including colorectal (n=147, 43%), upper GI, pancreas or bile duct (n=37, 11%) and urinary tract (n=32, 10%). The cumulative incidences for any subsequent cancer from age 40 to age 70 years were 73% for pathogenic MLH1 (path_MLH1), 76% for path_MSH2 carriers and 52% for path_MSH6 carriers, and for colorectal cancer (CRC) the cumulative incidences were 46%, 48% and 23%, respectively. Crude survival after any subsequent cancer was 82% (95% CI 76% to 87%) and 10-year crude survival after CRC was 91% (95% CI 83% to 95%). CONCLUSIONS: Relative incidence of subsequent cancer compared with incidence of first cancer was slightly but insignificantly higher than cancer incidence in patients with LS without previous cancer (range 0.94-1.49). The favourable survival after subsequent cancers validated continued follow-up to prevent death from cancer. The interactive website http://lscarisk.org was expanded to calculate the risks by gender, genetic variant and age for subsequent cancer for any patient with LS with previous cancer.


Asunto(s)
Neoplasias del Colon , Neoplasias Colorrectales Hereditarias sin Poliposis , Proteínas de Unión al ADN/genética , Homólogo 1 de la Proteína MutL/genética , Proteína 2 Homóloga a MutS/genética , Adulto , Anciano , Neoplasias del Colon/genética , Neoplasias del Colon/patología , Neoplasias Colorrectales Hereditarias sin Poliposis/epidemiología , Neoplasias Colorrectales Hereditarias sin Poliposis/genética , Neoplasias Colorrectales Hereditarias sin Poliposis/patología , Reparación de la Incompatibilidad de ADN/genética , Progresión de la Enfermedad , Europa (Continente)/epidemiología , Femenino , Variación Genética , Mutación de Línea Germinal , Humanos , Incidencia , Masculino , Persona de Mediana Edad , Estadificación de Neoplasias , Medición de Riesgo/métodos , Medición de Riesgo/estadística & datos numéricos , Análisis de Supervivencia
4.
Breast Cancer Res ; 19(1): 44, 2017 03 29.
Artículo en Inglés | MEDLINE | ID: mdl-28356166

RESUMEN

BACKGROUND: Breast cancer is a heterogeneous disease at the clinical and molecular level. In this study we integrate classifications extracted from five different molecular levels in order to identify integrated subtypes. METHODS: Tumor tissue from 425 patients with primary breast cancer from the Oslo2 study was cut and blended, and divided into fractions for DNA, RNA and protein isolation and metabolomics, allowing the acquisition of representative and comparable molecular data. Patients were stratified into groups based on their tumor characteristics from five different molecular levels, using various clustering methods. Finally, all previously identified and newly determined subgroups were combined in a multilevel classification using a "cluster-of-clusters" approach with consensus clustering. RESULTS: Based on DNA copy number data, tumors were categorized into three groups according to the complex arm aberration index. mRNA expression profiles divided tumors into five molecular subgroups according to PAM50 subtyping, and clustering based on microRNA expression revealed four subgroups. Reverse-phase protein array data divided tumors into five subgroups. Hierarchical clustering of tumor metabolic profiles revealed three clusters. Combining DNA copy number and mRNA expression classified tumors into seven clusters based on pathway activity levels, and tumors were classified into ten subtypes using integrative clustering. The final consensus clustering that incorporated all aforementioned subtypes revealed six major groups. Five corresponded well with the mRNA subtypes, while a sixth group resulted from a split of the luminal A subtype; these tumors belonged to distinct microRNA clusters. Gain-of-function studies using MCF-7 cells showed that microRNAs differentially expressed between the luminal A clusters were important for cancer cell survival. These microRNAs were used to validate the split in luminal A tumors in four independent breast cancer cohorts. In two cohorts the microRNAs divided tumors into subgroups with significantly different outcomes, and in another a trend was observed. CONCLUSIONS: The six integrated subtypes identified confirm the heterogeneity of breast cancer and show that finer subdivisions of subtypes are evident. Increasing knowledge of the heterogeneity of the luminal A subtype may add pivotal information to guide therapeutic choices, evidently bringing us closer to improved treatment for this largest subgroup of breast cancer.


Asunto(s)
Biomarcadores de Tumor , Neoplasias de la Mama/genética , Neoplasias de la Mama/metabolismo , Análisis por Conglomerados , Neoplasias de la Mama/epidemiología , Neoplasias de la Mama/mortalidad , Variaciones en el Número de Copia de ADN , Femenino , Perfilación de la Expresión Génica , Regulación Neoplásica de la Expresión Génica , Redes Reguladoras de Genes , Humanos , Redes y Vías Metabólicas , Metabolómica/métodos , MicroARNs/genética , Noruega/epidemiología , Pronóstico , ARN Mensajero/genética
5.
Biostatistics ; 17(1): 29-39, 2016 Jan.
Artículo en Inglés | MEDLINE | ID: mdl-26272994

RESUMEN

Removal of, or adjustment for, batch effects or center differences is generally required when such effects are present in data. In particular, when preparing microarray gene expression data from multiple cohorts, array platforms, or batches for later analyses, batch effects can have confounding effects, inducing spurious differences between study groups. Many methods and tools exist for removing batch effects from data. However, when study groups are not evenly distributed across batches, actual group differences may induce apparent batch differences, in which case batch adjustments may bias, usually deflate, group differences. Some tools therefore have the option of preserving the difference between study groups, e.g. using a two-way ANOVA model to simultaneously estimate both group and batch effects. Unfortunately, this approach may systematically induce incorrect group differences in downstream analyses when groups are distributed between the batches in an unbalanced manner. The scientific community seems to be largely unaware of how this approach may lead to false discoveries.


Asunto(s)
Interpretación Estadística de Datos , Análisis por Micromatrices/normas , Humanos , Análisis por Micromatrices/métodos , Reproducibilidad de los Resultados
6.
Artículo en Inglés | MEDLINE | ID: mdl-29046738

RESUMEN

BACKGROUND: We have previously reported a high incidence of colorectal cancer (CRC) in carriers of pathogenic MLH1 variants (path_MLH1) despite follow-up with colonoscopy including polypectomy. METHODS: The cohort included Finnish carriers enrolled in 3-yearly colonoscopy (n = 505; 4625 observation years) and carriers from other countries enrolled in colonoscopy 2-yearly or more frequently (n = 439; 3299 observation years). We examined whether the longer interval between colonoscopies in Finland could explain the high incidence of CRC and whether disease expression correlated with differences in population CRC incidence. RESULTS: Cumulative CRC incidences in carriers of path_MLH1 at 70-years of age were 41% for males and 36% for females in the Finnish series and 58% and 55% in the non-Finnish series, respectively (p > 0.05). Mean time from last colonoscopy to CRC was 32.7 months in the Finnish compared to 31.0 months in the non-Finnish (p > 0.05) and was therefore unaffected by the recommended colonoscopy interval. Differences in population incidence of CRC could not explain the lower point estimates for CRC in the Finnish series. Ten-year overall survival after CRC was similar for the Finnish and non-Finnish series (88% and 91%, respectively; p > 0.05). CONCLUSIONS: The hypothesis that the high incidence of CRC in path_MLH1 carriers was caused by a higher incidence in the Finnish series was not valid. We discuss whether the results were influenced by methodological shortcomings in our study or whether the assumption that a shorter interval between colonoscopies leads to a lower CRC incidence may be wrong. This second possibility is intriguing, because it suggests the dogma that CRC in path_MLH1 carriers develops from polyps that can be detected at colonoscopy and removed to prevent CRC may be erroneous. In view of the excellent 10-year overall survival in the Finnish and non-Finnish series we remain strong advocates of current surveillance practices for those with LS pending studies that will inform new recommendations on the best surveillance interval.

7.
Nucleic Acids Res ; 42(18): e143, 2014 Oct.
Artículo en Inglés | MEDLINE | ID: mdl-25114054

RESUMEN

Identification of three-dimensional (3D) interactions between regulatory elements across the genome is crucial to unravel the complex regulatory machinery that orchestrates proliferation and differentiation of cells. ChIA-PET is a novel method to identify such interactions, where physical contacts between regions bound by a specific protein are quantified using next-generation sequencing. However, determining the significance of the observed interaction frequencies in such datasets is challenging, and few methods have been proposed. Despite the fact that regions that are close in linear genomic distance have a much higher tendency to interact by chance, no methods to date are capable of taking such dependency into account. Here, we propose a statistical model taking into account the genomic distance relationship, as well as the general propensity of anchors to be involved in contacts overall. Using both real and simulated data, we show that the previously proposed statistical test, based on Fisher's exact test, leads to invalid results when data are dependent on genomic distance. We also evaluate our method on previously validated cell-line specific and constitutive 3D interactions, and show that relevant interactions are significant, while avoiding over-estimating the significance of short nearby interactions.


Asunto(s)
Cromatina/química , Genómica/métodos , Modelos Estadísticos , Subunidad alfa 2 del Factor de Unión al Sitio Principal/genética , Secuenciación de Nucleótidos de Alto Rendimiento , Análisis de Secuencia de ADN
8.
Breast Cancer Res ; 17: 29, 2015 Feb 26.
Artículo en Inglés | MEDLINE | ID: mdl-25849221

RESUMEN

INTRODUCTION: Breast cancer is commonly classified into intrinsic molecular subtypes. Standard gene centering is routinely done prior to molecular subtyping, but it can produce inaccurate classifications when the distribution of clinicopathological characteristics in the study cohort differs from that of the training cohort used to derive the classifier. METHODS: We propose a subgroup-specific gene-centering method to perform molecular subtyping on a study cohort that has a skewed distribution of clinicopathological characteristics relative to the training cohort. On such a study cohort, we center each gene on a specified percentile, where the percentile is determined from a subgroup of the training cohort with clinicopathological characteristics similar to the study cohort. We demonstrate our method using the PAM50 classifier and its associated University of North Carolina (UNC) training cohort. We considered study cohorts with skewed clinicopathological characteristics, including subgroups composed of a single prototypic subtype of the UNC-PAM50 training cohort (n = 139), an external estrogen receptor (ER)-positive cohort (n = 48) and an external triple-negative cohort (n = 77). RESULTS: Subgroup-specific gene centering improved prediction performance with the accuracies between 77% and 100%, compared to accuracies between 17% and 33% from standard gene centering, when applied to the prototypic tumor subsets of the PAM50 training cohort. It reduced classification error rates on the ER-positive (11% versus 28%; P = 0.0389), the ER-negative (5% versus 41%; P < 0.0001) and the triple-negative (11% versus 56%; P = 0.1336) subgroups of the PAM50 training cohort. In addition, it produced higher accuracy for subtyping study cohorts composed of varying proportions of ER-positive versus ER-negative cases. Finally, it increased the percentage of assigned luminal subtypes on the external ER-positive cohort and basal-like subtype on the external triple-negative cohort. CONCLUSIONS: Gene centering is often necessary to accurately apply a molecular subtype classifier. Compared with standard gene centering, our proposed subgroup-specific gene centering produced more accurate molecular subtype assignments in a study cohort with skewed clinicopathological characteristics relative to the training cohort.


Asunto(s)
Biomarcadores de Tumor/genética , Neoplasias de la Mama/diagnóstico , Neoplasias de la Mama/genética , Perfilación de la Expresión Génica , Tipificación Molecular , Estudios de Cohortes , Conjuntos de Datos como Asunto , Femenino , Perfilación de la Expresión Génica/métodos , Regulación Neoplásica de la Expresión Génica , Humanos , Tipificación Molecular/métodos , Pronóstico , Receptores de Estrógenos/genética
9.
Mol Cell Proteomics ; 12(6): 1723-34, 2013 Jun.
Artículo en Inglés | MEDLINE | ID: mdl-23438732

RESUMEN

Protein complexes enact most biochemical functions in the cell. Dynamic interactions between protein complexes are frequent in many cellular processes. As they are often of a transient nature, they may be difficult to detect using current genome-wide screens. Here, we describe a method to computationally predict physical interactions between protein complexes, applied to both humans and yeast. We integrated manually curated protein complexes and physical protein interaction networks, and we designed a statistical method to identify pairs of protein complexes where the number of protein interactions between a complex pair is due to an actual physical interaction between the complexes. An evaluation against manually curated physical complex-complex interactions in yeast revealed that 50% of these interactions could be predicted in this manner. A community network analysis of the highest scoring pairs revealed a biologically sensible organization of physical complex-complex interactions in the cell. Such analyses of proteomes may serve as a guide to the discovery of novel functional cellular relationships.


Asunto(s)
Algoritmos , Mapeo de Interacción de Proteínas/estadística & datos numéricos , Mapas de Interacción de Proteínas , Proteoma/metabolismo , Proteínas de Saccharomyces cerevisiae/metabolismo , Saccharomyces cerevisiae/metabolismo , Bases de Datos de Proteínas , Humanos , Funciones de Verosimilitud , Unión Proteica , Multimerización de Proteína , Saccharomyces cerevisiae/química
10.
Breast Cancer Res ; 16(1): R5, 2014 Jan 21.
Artículo en Inglés | MEDLINE | ID: mdl-24447408

RESUMEN

INTRODUCTION: Dysregulated choline metabolism is a well-known feature of breast cancer, but the underlying mechanisms are not fully understood. In this study, the metabolomic and transcriptomic characteristics of a large panel of human breast cancer xenograft models were mapped, with focus on choline metabolism. METHODS: Tumor specimens from 34 patient-derived xenograft models were collected and divided in two. One part was examined using high-resolution magic angle spinning (HR-MAS) MR spectroscopy while another part was analyzed using gene expression microarrays. Expression data of genes encoding proteins in the choline metabolism pathway were analyzed and correlated to the levels of choline (Cho), phosphocholine (PCho) and glycerophosphocholine (GPC) using Pearson's correlation analysis. For comparison purposes, metabolic and gene expression data were collected from human breast tumors belonging to corresponding molecular subgroups. RESULTS: Most of the xenograft models were classified as basal-like (N = 19) or luminal B (N = 7). These two subgroups showed significantly different choline metabolic and gene expression profiles. The luminal B xenografts were characterized by a high PCho/GPC ratio while the basal-like xenografts were characterized by highly variable PCho/GPC ratio. Also, Cho, PCho and GPC levels were correlated to expression of several genes encoding proteins in the choline metabolism pathway, including choline kinase alpha (CHKA) and glycerophosphodiester phosphodiesterase domain containing 5 (GDPD5). These characteristics were similar to those found in human tumor samples. CONCLUSION: The higher PCho/GPC ratio found in luminal B compared with most basal-like breast cancer xenograft models and human tissue samples do not correspond to results observed from in vitro studies. It is likely that microenvironmental factors play a role in the in vivo regulation of choline metabolism. Cho, PCho and GPC were correlated to different choline pathway-encoding genes in luminal B compared with basal-like xenografts, suggesting that regulation of choline metabolism may vary between different breast cancer subgroups. The concordance between the metabolic and gene expression profiles from xenograft models with breast cancer tissue samples from patients indicates that these xenografts are representative models of human breast cancer and represent relevant models to study tumor metabolism in vivo.


Asunto(s)
Neoplasias de la Mama/metabolismo , Colina/metabolismo , Glicerilfosforilcolina/metabolismo , Fosforilcolina/metabolismo , Animales , Neoplasias de la Mama/genética , Neoplasias de la Mama/patología , Colina Quinasa/biosíntesis , Colina Quinasa/genética , Femenino , Expresión Génica , Perfilación de la Expresión Génica , Regulación Neoplásica de la Expresión Génica , Humanos , Metabolómica , Ratones , Trasplante de Neoplasias , Hidrolasas Diéster Fosfóricas/biosíntesis , Hidrolasas Diéster Fosfóricas/genética , Hidrolasas Diéster Fosfóricas/metabolismo , Análisis de Matrices Tisulares , Transcriptoma , Trasplante Heterólogo
11.
BMC Cancer ; 14: 211, 2014 Mar 19.
Artículo en Inglés | MEDLINE | ID: mdl-24645668

RESUMEN

BACKGROUND: The aim was to assess and compare prognostic power of nine breast cancer gene signatures (Intrinsic, PAM50, 70-gene, 76-gene, Genomic-Grade-Index, 21-gene-Recurrence-Score, EndoPredict, Wound-Response and Hypoxia) in relation to ER status and follow-up time. METHODS: A gene expression dataset from 947 breast tumors was used to evaluate the signatures for prediction of Distant Metastasis Free Survival (DMFS). A total of 912 patients had available DMFS status. The recently published METABRIC cohort was used as an additional validation set. RESULTS: Survival predictions were fairly concordant across most signatures. Prognostic power declined with follow-up time. During the first 5 years of followup, all signatures except for Hypoxia were predictive for DMFS in ER-positive disease, and 76-gene, Hypoxia and Wound-Response were prognostic in ER-negative disease. After 5 years, the signatures had little prognostic power. Gene signatures provide significant prognostic information beyond tumor size, node status and histological grade. CONCLUSIONS: Generally, these signatures performed better for ER-positive disease, indicating that risk within each ER stratum is driven by distinct underlying biology. Most of the signatures were strong risk predictors for DMFS during the first 5 years of follow-up. Combining gene signatures with histological grade or tumor size, could improve the prognostic power, perhaps also of long-term survival.


Asunto(s)
Neoplasias de la Mama/diagnóstico , Neoplasias de la Mama/genética , Bases de Datos Genéticas , Perfilación de la Expresión Génica/métodos , Receptores de Estrógenos/genética , Neoplasias de la Mama/mortalidad , Estudios de Cohortes , Femenino , Estudios de Seguimiento , Humanos , Pronóstico , Receptores de Estrógenos/biosíntesis , Reproducibilidad de los Resultados , Tasa de Supervivencia/tendencias , Factores de Tiempo
12.
BMC Bioinformatics ; 14: 313, 2013 Oct 23.
Artículo en Inglés | MEDLINE | ID: mdl-24152242

RESUMEN

BACKGROUND: Processing of reads from high throughput sequencing is often done in terms of edges in the de Bruijn graph representing all k-mers from the reads. The memory requirements for storing all k-mers in a lookup table can be demanding, even after removal of read errors, but can be alleviated by using a memory efficient data structure. RESULTS: The FM-index, which is based on the Burrows-Wheeler transform, provides an efficient data structure providing a searchable index of all substrings from a set of strings, and is used to compactly represent full genomes for use in mapping reads to a genome: the memory required to store this is in the same order of magnitude as the strings themselves. However, reads from high throughput sequences mostly have high coverage and so contain the same substrings multiple times from different reads. I here present a modification of the FM-index, which I call the kFM-index, for indexing the set of k-mers from the reads. For DNA sequences, this requires 5 bit of information for each vertex of the corresponding de Bruijn subgraph, i.e. for each different k-1-mer, plus some additional overhead, typically 0.5 to 1 bit per vertex, for storing the equivalent of the FM-index for walking the underlying de Bruijn graph and reproducing the actual k-mers efficiently. CONCLUSIONS: The kFM-index could replace more memory demanding data structures for storing the de Bruijn k-mer graph representation of sequence reads. A Java implementation with additional technical documentation is provided which demonstrates the applicability of the data structure (http://folk.uio.no/einarro/Projects/KFM-index/).


Asunto(s)
Algoritmos , Genómica/métodos , Análisis de Secuencia de ADN/métodos
13.
Biostatistics ; 18(3): 586-587, 2017 07 01.
Artículo en Inglés | MEDLINE | ID: mdl-28334081
14.
Hum Mutat ; 31(12): 1316-25, 2010 Dec.
Artículo en Inglés | MEDLINE | ID: mdl-20886615

RESUMEN

In selectively neutral regions of the human genome, nucleotide substitutions do not occur at random with respect to the local DNA sequence neighborhood. However, apart from the hypermutability of methylated CpG dinucleotides, which can explain the overrepresentation of nucleotide transitions in this context, the sequence-specific factors underlying point mutation bias remain largely to be determined, both in nature and in quantitative impact. One hypothesis suggests that the physical characteristics of a DNA context could have a modulating effect on its mutability, adjusting the impact of damage or the efficiency of repair. Here, we report a genome-wide computational test of this hypothesis, in which we utilize a constrained set of human non-CpG SNPs as the source of selectively neutral germline mutations. Interestingly, we observe that the quantitative context-dependencies of some substitution types display significant associations to measures of local structural topography and helix stability in DNA. Most prominently, we find that the local sequence bias of transition mutations is significantly associated with the sequence-dependent level of helix instability imposed by the potentially underlying DNA mismatches. The results of our work indicate the extent to which DNA physical properties could have shaped the recent point mutational spectrum in the human genome.


Asunto(s)
Fenómenos Biofísicos , ADN/química , Mutación/genética , Disparidad de Par Base/genética , Secuencia de Bases , Humanos , Radical Hidroxilo/metabolismo , Conformación de Ácido Nucleico , Filogenia , Polimorfismo de Nucleótido Simple/genética , Reproducibilidad de los Resultados
15.
J Mol Evol ; 70(3): 266-74, 2010 Mar.
Artículo en Inglés | MEDLINE | ID: mdl-20213140

RESUMEN

The question of whether natural selection favors genetic stability or genetic variability is a fundamental problem in evolutionary biology. Bioinformatic analyses demonstrate that selection favors genetic stability by avoiding unstable nucleotide sequences in protein encoding DNA. Yet, such unstable sequences are maintained in several DNA repair genes, thereby promoting breakdown of repair and destabilizing the genome. Several studies have therefore argued that selection favors genetic variability at the expense of stability. Here we propose a new evolutionary mechanism, with supporting bioinformatic evidence, that resolves this paradox. Combining the concepts of gene-dependent mutation biases and meiotic recombination, we argue that unstable sequences in the DNA mismatch repair (MMR) genes are maintained by their own phenotype. In particular, we predict that human MMR maintains an overrepresentation of mononucleotide repeats (monorepeats) within and around the MMR genes. In support of this hypothesis, we report a 31% excess in monorepeats in 250 kb regions surrounding the seven MMR genes compared to all other RefSeq genes (1.75 vs. 1.34%, P = 0.0047), with a particularly high content in PMS2 (2.41%, P = 0.0047) and MSH6 (2.07%, P = 0.043). Based on a mathematical model of monorepeat frequency, we argue that the proposed mechanism may suffice to explain the observed excess of repeats around MMR genes. Our findings thus indicate that unstable sequences in MMR genes are maintained through evolution by the MMR mechanism. The evolutionary paradox of genetically unstable DNA repair genes may thus be explained by an equilibrium in which the phenotype acts back on its own genotype.


Asunto(s)
Secuencia de Bases/fisiología , Reparación del ADN/genética , Variación Genética/fisiología , Inestabilidad Genómica/fisiología , Evolución Molecular , Frecuencia de los Genes , Genes/fisiología , Humanos , Modelos Biológicos , Modelos Genéticos , Modelos Teóricos , Fenotipo , Secuencias Repetitivas de Ácidos Nucleicos/genética , Análisis de Secuencia de ADN
16.
Bioinformatics ; 25(8): 996-1003, 2009 Apr 15.
Artículo en Inglés | MEDLINE | ID: mdl-19244388

RESUMEN

MOTIVATION: Helix-helix interactions play a critical role in the structure assembly, stability and function of membrane proteins. On the molecular level, the interactions are mediated by one or more residue contacts. Although previous studies focused on helix-packing patterns and sequence motifs, few of them developed methods specifically for contact prediction. RESULTS: We present a new hierarchical framework for contact prediction, with an application in membrane proteins. The hierarchical scheme consists of two levels: in the first level, contact residues are predicted from the sequence and their pairing relationships are further predicted in the second level. Statistical analyses on contact propensities are combined with other sequence and structural information for training the support vector machine classifiers. Evaluated on 52 protein chains using leave-one-out cross validation (LOOCV) and an independent test set of 14 protein chains, the two-level approach consistently improves the conventional direct approach in prediction accuracy, with 80% reduction of input for prediction. Furthermore, the predicted contacts are then used to infer interactions between pairs of helices. When at least three predicted contacts are required for an inferred interaction, the accuracy, sensitivity and specificity are 56%, 40% and 89%, respectively. Our results demonstrate that a hierarchical framework can be applied to eliminate false positives (FP) while reducing computational complexity in predicting contacts. Together with the estimated contact propensities, this method can be used to gain insights into helix-packing in membrane proteins.


Asunto(s)
Biología Computacional/métodos , Proteínas de la Membrana/química , Bases de Datos de Proteínas , Proteínas de la Membrana/metabolismo , Modelos Biológicos , Estructura Secundaria de Proteína , Reproducibilidad de los Resultados
17.
Cancers (Basel) ; 12(2)2020 Feb 10.
Artículo en Inglés | MEDLINE | ID: mdl-32050665

RESUMEN

The authors wish to make the following corrections to this paper [1]: The authors would like to replace Table 3 in [1]. The corrections are correcting typographical errors when translating our database in BIC format to HGVS nomenclature, and removing four carriers which had zero follow-up time. [...].

18.
BMC Genomics ; 10: 43, 2009 Jan 22.
Artículo en Inglés | MEDLINE | ID: mdl-19161616

RESUMEN

BACKGROUND: Recent segmental duplications are relatively large (> or = 1 kb) genomic regions of high sequence identity (> or = 90%). They cover approximately 4-5% of the human genome and play important roles in gene evolution and genomic disease. The DNA sequence differences between copies of a segmental duplication represent the result of various mutational events over time, since any two duplication copies originated from the same ancestral DNA sequence. Based on this fact, we have developed a computational scheme for inference of point mutational events in human segmental duplications, which we collectively term duplication-inferred mutations (DIMs). We have characterized these nucleotide substitutions by comparing them with high-quality SNPs from dbSNP, both in terms of sequence context and frequency of substitution types. RESULTS: Overall, DIMs show a lower ratio of transitions relative to transversions than SNPs, although this ratio approaches that of SNPs when considering DIMs within most recent duplications. Our findings indicate that DIMs and SNPs in general are caused by similar mutational mechanisms, with some deviances at the CpG dinucleotide. Furthermore, we discover a large number of reference SNPs that coincide with computationally inferred DIMs. The latter reflects how sequence variation in duplicated sequences can be misinterpreted as ordinary allelic variation. CONCLUSION: In summary, we show how DNA sequence analysis of segmental duplications can provide a genome-wide mutational spectrum that mirrors recent genome evolution. The inferred set of nucleotide substitutions represents a valuable complement to SNPs for the analysis of genetic variation and point mutagenesis.


Asunto(s)
Duplicación de Gen , Mutación Puntual , Secuencia de Bases , Islas de CpG , Genoma Humano , Humanos , Datos de Secuencia Molecular , Polimorfismo de Nucleótido Simple , Análisis de Secuencia de ADN
19.
Nucleic Acids Res ; 35(9): 3100-8, 2007.
Artículo en Inglés | MEDLINE | ID: mdl-17452365

RESUMEN

The publication of a complete genome sequence is usually accompanied by annotations of its genes. In contrast to protein coding genes, genes for ribosomal RNA (rRNA) are often poorly or inconsistently annotated. This makes comparative studies based on rRNA genes difficult. We have therefore created computational predictors for the major rRNA species from all kingdoms of life and compiled them into a program called RNAmmer. The program uses hidden Markov models trained on data from the 5S ribosomal RNA database and the European ribosomal RNA database project. A pre-screening step makes the method fast with little loss of sensitivity, enabling the analysis of a complete bacterial genome in less than a minute. Results from running RNAmmer on a large set of genomes indicate that the location of rRNAs can be predicted with a very high level of accuracy. Novel, unannotated rRNAs are also predicted in many genomes. The software as well as the genome analysis results are available at the CBS web server.


Asunto(s)
Genes de ARNr , Programas Informáticos , Biología Computacional/métodos , Genoma Bacteriano , Genómica/métodos , Cadenas de Markov
20.
Cancers (Basel) ; 11(2)2019 Jan 23.
Artículo en Inglés | MEDLINE | ID: mdl-30678073

RESUMEN

Background: We have previously demonstrated that the Norwegian frequent pathogenic BRCA1 (path_BRCA1) variants are caused by genetic drift and recurrent de-novo mutations. We here examined the penetrance of frequent path_BRCA1 variants in fertile ages as a surrogate marker for fitness. Material and methods: We conducted an observational prospective study of penetrance for cancer in Norwegian female carriers of frequent path_BRCA1 variants, and compared our observed results to penetrance of infrequent path_BRCA1 variants and to average penetrance of path_BRCA1 variants reported by others. Results: The cumulative risk for breast cancer at 45 years in carriers of frequent path_BRCA1 variants was 20% (94% confidence interval 10⁻30%), compared to 35% (95% confidence interval 22⁻48%) in carriers of infrequent path_BRCA1 variants (p = 0.02), and to the 35% (confidence interval 32⁻39%) average for path_BRCA1 carriers reported by others (p = 0.0001). Discussion and conclusion: Carriers of the most frequent Norwegian path_BRCA1 variants had low incidence of cancer in fertile ages, indicating a low selective disadvantage. This, together with the variant locations being hotspots for de novo mutations and subject to genetic drift, as previously described, may have caused their high prevalence today. Besides being of theoretical interest to explain the phenomenon that a few path_BRCA1 variants are frequent, the later onset of breast cancer associated with the most frequent path_BRCA1 variants may be of interest for carriers who have to decide if and when to select prophylactic mastectomy.

SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA