Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 99
Filtrar
1.
Genome Biol ; 24(1): 263, 2023 Nov 16.
Artigo em Inglês | MEDLINE | ID: mdl-37974217

RESUMO

Differential analysis of bulk RNA-seq data often suffers from lack of good controls. Here, we present a generative model that replaces controls, trained solely on healthy tissues. The unsupervised model learns a low-dimensional representation and can identify the closest normal representation for a given disease sample. This enables control-free, single-sample differential expression analysis. In breast cancer, we demonstrate how our approach selects marker genes and outperforms a state-of-the-art method. Furthermore, significant genes identified by the model are enriched in driver genes across cancers. Our results show that the in silico closest normal provides a more favorable comparison than control samples.


Assuntos
Aprendizagem , Aprendizado de Máquina , RNA-Seq/métodos , Expressão Gênica
2.
Bioinformatics ; 39(9)2023 09 02.
Artigo em Inglês | MEDLINE | ID: mdl-37572301

RESUMO

MOTIVATION: Learning low-dimensional representations of single-cell transcriptomics has become instrumental to its downstream analysis. The state of the art is currently represented by neural network models, such as variational autoencoders, which use a variational approximation of the likelihood for inference. RESULTS: We here present the Deep Generative Decoder (DGD), a simple generative model that computes model parameters and representations directly via maximum a posteriori estimation. The DGD handles complex parameterized latent distributions naturally unlike variational autoencoders, which typically use a fixed Gaussian distribution, because of the complexity of adding other types. We first show its general functionality on a commonly used benchmark set, Fashion-MNIST. Secondly, we apply the model to multiple single-cell datasets. Here, the DGD learns low-dimensional, meaningful, and well-structured latent representations with sub-clustering beyond the provided labels. The advantages of this approach are its simplicity and its capability to provide representations of much smaller dimensionality than a comparable variational autoencoder. AVAILABILITY AND IMPLEMENTATION: scDGD is available as a python package at https://github.com/Center-for-Health-Data-Science/scDGD. The remaining code is made available here: https://github.com/Center-for-Health-Data-Science/dgd.


Assuntos
Redes Neurais de Computação , RNA , Perfilação da Expressão Gênica , Probabilidade , Distribuição Normal , Análise de Célula Única
3.
JCO Clin Cancer Inform ; 7: e2300021, 2023 Jun.
Artigo em Inglês | MEDLINE | ID: mdl-37390377

RESUMO

PURPOSE: Synthetic data are artificial data generated without including any real patient information by an algorithm trained to learn the characteristics of a real source data set and became widely used to accelerate research in life sciences. We aimed to (1) apply generative artificial intelligence to build synthetic data in different hematologic neoplasms; (2) develop a synthetic validation framework to assess data fidelity and privacy preservability; and (3) test the capability of synthetic data to accelerate clinical/translational research in hematology. METHODS: A conditional generative adversarial network architecture was implemented to generate synthetic data. Use cases were myelodysplastic syndromes (MDS) and AML: 7,133 patients were included. A fully explainable validation framework was created to assess fidelity and privacy preservability of synthetic data. RESULTS: We generated MDS/AML synthetic cohorts (including information on clinical features, genomics, treatment, and outcomes) with high fidelity and privacy performances. This technology allowed resolution of lack/incomplete information and data augmentation. We then assessed the potential value of synthetic data on accelerating research in hematology. Starting from 944 patients with MDS available since 2014, we generated a 300% augmented synthetic cohort and anticipated the development of molecular classification and molecular scoring system obtained many years later from 2,043 to 2,957 real patients, respectively. Moreover, starting from 187 MDS treated with luspatercept into a clinical trial, we generated a synthetic cohort that recapitulated all the clinical end points of the study. Finally, we developed a website to enable clinicians generating high-quality synthetic data from an existing biobank of real patients. CONCLUSION: Synthetic data mimic real clinical-genomic features and outcomes, and anonymize patient information. The implementation of this technology allows to increase the scientific use and value of real data, thus accelerating precision medicine in hematology and the conduction of clinical trials.


Assuntos
Hematologia , Leucemia Mieloide Aguda , Humanos , Medicina de Precisão , Inteligência Artificial , Algoritmos
4.
PeerJ ; 10: e13666, 2022.
Artigo em Inglês | MEDLINE | ID: mdl-36157058

RESUMO

One way to better understand the structure in DNA is by learning to predict the sequence. Here, we trained a model to predict the missing base at any given position, given its left and right flanking contexts. Our best-performing model was a neural network that obtained an accuracy close to 54% on the human genome, which is 2% points better than modelling the data using a Markov model. In likelihood-ratio tests, the neural network performed significantly better than any of the alternative models by a large margin. We report on where the accuracy was obtained, first observing that the performance appeared to be uniform over the chromosomes. The models performed best in repetitive sequences, as expected, although their performance far from random in the more difficult coding sections, the proportions being ~70:40%. We further explored the sources of the accuracy, Fourier transforming the predictions revealed weak but clear periodic signals. In the human genome the characteristic periods hinted at connections to nucleosome positioning. We found similar periodic signals in GC/AT content in the human genome, which to the best of our knowledge have not been reported before. On other large genomes similarly high accuracy was found, while lower predictive accuracy was observed on smaller genomes. Only in the mouse genome did we see periodic signals in the same range as in the human genome, though weaker and of a different type. This indicates that the sources of these signals are other or more than nucleosome arrangement. Interestingly, applying a model trained on the mouse genome to the human genome resulted in a performance far below that of the human model, except in the difficult coding regions. Despite the clear outcomes of the likelihood-ratio tests, there is currently a limited superiority of the neural network methods over the Markov model. We expect, however, that there is great potential for better modelling DNA using different neural network architectures.


Assuntos
Redes Neurais de Computação , Nucleossomos , Humanos , Animais , Camundongos , Sequência de Bases , DNA/genética , Genoma Humano
6.
BMC Genomics ; 23(1): 87, 2022 Jan 31.
Artigo em Inglês | MEDLINE | ID: mdl-35100973

RESUMO

BACKGROUND: Genomic DNA has been shaped by mutational processes through evolution. The cellular machinery for error correction and repair has left its marks in the nucleotide composition along with structural and functional constraints. Therefore, the probability of observing a base in a certain position in the human genome is highly context-dependent. RESULTS: Here we develop context-dependent nucleotide models. We first investigate models of nucleotides conditioned on sequence context. We develop a bidirectional Markov model that use an average of the probability from a Markov model applied to both strands of the sequence and thus depends on up to 14 bases to each side of the nucleotide. We show how the genome predictability varies across different types of genomic regions. Surprisingly, this model can predict a base from its context with an average of more than 50% accuracy. For somatic variants we show a tendency towards higher probability for the variant base than for the reference base. Inspired by DNA substitution models, we develop a model of mutability that estimates a mutation matrix (called the alpha matrix) on top of the nucleotide distribution. The alpha matrix can be estimated from a much smaller context than the nucleotide model, but the final model will still depend on the full context of the nucleotide model. With the bidirectional Markov model of order 14 and an alpha matrix dependent on just one base to each side, we obtain a model that compares well with a model of mutability that estimates mutation probabilities directly conditioned on three nucleotides to each side. For somatic variants in particular, our model fits better than the simpler model. Interestingly, the model is not very sensitive to the size of the context for the alpha matrix. CONCLUSIONS: Our study found strong context dependencies of nucleotides in the human genome. The best model uses a context of 14 nucleotides to each side. Based on these models, a substitution model was constructed that separates into the context model and a matrix dependent on a small context. The model fit somatic variants particularly well.


Assuntos
DNA , Nucleotídeos , DNA/genética , Genoma Humano , Genômica , Humanos , Nucleotídeos/genética , Probabilidade
7.
Entropy (Basel) ; 23(11)2021 Oct 25.
Artigo em Inglês | MEDLINE | ID: mdl-34828101

RESUMO

Autoencoders are commonly used in representation learning. They consist of an encoder and a decoder, which provide a straightforward method to map n-dimensional data in input space to a lower m-dimensional representation space and back. The decoder itself defines an m-dimensional manifold in input space. Inspired by manifold learning, we showed that the decoder can be trained on its own by learning the representations of the training samples along with the decoder weights using gradient descent. A sum-of-squares loss then corresponds to optimizing the manifold to have the smallest Euclidean distance to the training samples, and similarly for other loss functions. We derived expressions for the number of samples needed to specify the encoder and decoder and showed that the decoder generally requires much fewer training samples to be well-specified compared to the encoder. We discuss the training of autoencoders in this perspective and relate it to previous work in the field that uses noisy training examples and other types of regularization. On the natural image data sets MNIST and CIFAR10, we demonstrated that the decoder is much better suited to learn a low-dimensional representation, especially when trained on small data sets. Using simulated gene regulatory data, we further showed that the decoder alone leads to better generalization and meaningful representations. Our approach of training the decoder alone facilitates representation learning even on small data sets and can lead to improved training of autoencoders. We hope that the simple analyses presented will also contribute to an improved conceptual understanding of representation learning.

8.
Mol Oncol ; 15(2): 429-461, 2021 02.
Artigo em Inglês | MEDLINE | ID: mdl-33176066

RESUMO

Despite significant advancements in breast cancer (BC) research, clinicians lack robust serological protein markers for accurate diagnostics and tumor stratification. Tumor interstitial fluid (TIF) accumulates aberrantly externalized proteins within the local tumor space, which can potentially gain access to the circulatory system. As such, TIF may represent a valuable starting point for identifying relevant tumor-specific serological biomarkers. The aim of the study was to perform comprehensive proteomic profiling of TIF to identify proteins associated with BC tumor status and subtype. A liquid chromatography tandem mass spectrometry (LC-MS/MS) analysis of 35 TIFs of three main subtypes: luminal (19), Her2 (4), and triple-negative (TNBC) (12) resulted in the identification of > 8800 proteins. Unsupervised hierarchical clustering segregated the TIF proteome into two major clusters, luminal and TNBC/Her2 subgroups. High-grade tumors enriched with tumor infiltrating lymphocytes (TILs) were also stratified from low-grade tumors. A consensus analysis approach, including differential abundance analysis, selection operator regression, and random forest returned a minimal set of 24 proteins associated with BC subtypes, receptor status, and TIL scoring. Among them, a panel of 10 proteins, AGR3, BCAM, CELSR1, MIEN1, NAT1, PIP4K2B, SEC23B, THTPA, TMEM51, and ULBP2, was found to stratify the tumor subtype-specific TIFs. In particular, upregulation of BCAM and CELSR1 differentiates luminal subtypes, while upregulation of MIEN1 differentiates Her2 subtypes. Immunohistochemistry analysis showed a direct correlation between protein abundance in TIFs and intratumor expression levels for all 10 proteins. Sensitivity and specificity were estimated for this protein panel by using an independent, comprehensive breast tumor proteome dataset. The results of this analysis strongly support our data, with eight of the proteins potentially representing biomarkers for stratification of BC subtypes. Five of the most representative proteomics databases currently available were also used to estimate the potential for these selected proteins to serve as putative serological markers.


Assuntos
Biomarcadores Tumorais/metabolismo , Neoplasias da Mama/metabolismo , Líquido Extracelular/metabolismo , Proteínas de Neoplasias/metabolismo , Proteômica , Cromatografia Líquida , Feminino , Humanos , Linfócitos do Interstício Tumoral/metabolismo , Espectrometria de Massas em Tandem
9.
Breast Cancer Res ; 22(1): 73, 2020 06 30.
Artigo em Inglês | MEDLINE | ID: mdl-32605588

RESUMO

BACKGROUND: Studies on tumor-secreted microRNAs point to a functional role of these in cellular communication and reprogramming of the tumor microenvironment. Uptake of tumor-secreted microRNAs by neighboring cells may result in the silencing of mRNA targets and, in turn, modulation of the transcriptome. Studying miRNAs externalized from tumors could improve cancer patient diagnosis and disease monitoring and help to pinpoint which miRNA-gene interactions are central for tumor properties such as invasiveness and metastasis. METHODS: Using a bioinformatics approach, we analyzed the profiles of secreted tumor and normal interstitial fluid (IF) microRNAs, from women with breast cancer (BC). We carried out differential abundance analysis (DAA), to obtain miRNAs, which were enriched or depleted in IFs, from patients with different clinical traits. Subsequently, miRNA family enrichment analysis was performed to assess whether any families were over-represented in the specific sets. We identified dysregulated genes in tumor tissues from the same cohort of patients and constructed weighted gene co-expression networks, to extract sets of co-expressed genes and co-abundant miRNAs. Lastly, we integrated miRNAs and mRNAs to obtain interaction networks and supported our findings using prediction tools and cancer gene databases. RESULTS: Network analysis showed co-expressed genes and miRNA regulators, associated with tumor lymphocyte infiltration. All of the genes were involved in immune system processes, and many had previously been associated with cancer immunity. A subset of these, BTLA, CXCL13, IL7R, LAMP3, and LTB, was linked to the presence of tertiary lymphoid structures and high endothelial venules within tumors. Co-abundant tumor interstitial fluid miRNAs within this network, including miR-146a and miR-494, were annotated as negative regulators of immune-stimulatory responses. One co-expression network encompassed differences between BC subtypes. Genes differentially co-expressed between luminal B and triple-negative breast cancer (TNBC) were connected with sphingolipid metabolism and predicted to be co-regulated by miR-23a. Co-expressed genes and TIF miRNAs associated with tumor grade were BTRC, CHST1, miR-10a/b, miR-107, miR-301a, and miR-454. CONCLUSION: Integration of IF miRNAs and mRNAs unveiled networks associated with patient clinicopathological traits, and underlined molecular mechanisms, specific to BC sub-groups. Our results highlight the benefits of an integrative approach to biomarker discovery, placing secreted miRNAs within a biological context.


Assuntos
Linfócitos do Interstício Tumoral/imunologia , MicroRNAs/genética , Neoplasias de Mama Triplo Negativas/genética , Biomarcadores Tumorais/genética , Biomarcadores Tumorais/imunologia , Líquido Extracelular/metabolismo , Feminino , Seguimentos , Perfilação da Expressão Gênica , Redes Reguladoras de Genes , Humanos , Linfócitos do Interstício Tumoral/metabolismo , MicroRNAs/metabolismo , Gradação de Tumores , Receptor ErbB-2/metabolismo , Receptores de Estrogênio/metabolismo , Receptores de Progesterona/metabolismo , Neoplasias de Mama Triplo Negativas/imunologia , Neoplasias de Mama Triplo Negativas/patologia , Microambiente Tumoral/genética , Microambiente Tumoral/imunologia
10.
Nucleic Acids Res ; 48(16): e93, 2020 09 18.
Artigo em Inglês | MEDLINE | ID: mdl-32633756

RESUMO

Characterizing species diversity and composition of bacteria hosted by biota is revolutionizing our understanding of the role of symbiotic interactions in ecosystems. Determining microbiomes diversity implies the assignment of individual reads to taxa by comparison to reference databases. Although computational methods aimed at identifying the microbe(s) taxa are available, it is well known that inferences using different methods can vary widely depending on various biases. In this study, we first apply and compare different bioinformatics methods based on 16S ribosomal RNA gene and shotgun sequencing to three mock communities of bacteria, of which the compositions are known. We show that none of these methods can infer both the true number of taxa and their abundances. We thus propose a novel approach, named Core-Kaiju, which combines the power of shotgun metagenomics data with a more focused marker gene classification method similar to 16S, but based on emergent statistics of core protein domain families. We thus test the proposed method on various mock communities and we show that Core-Kaiju reliably predicts both number of taxa and abundances. Finally, we apply our method on human gut samples, showing how Core-Kaiju may give more accurate ecological characterization and a fresh view on real microbiomes.


Assuntos
Bactérias/classificação , Microbioma Gastrointestinal/genética , Metagenoma , Metagenômica/métodos , Filogenia , RNA Ribossômico 16S/genética , Bactérias/genética , Biologia Computacional , DNA Bacteriano/genética , Bases de Dados de Proteínas , Marcadores Genéticos , Humanos , Análise de Sequência de DNA
11.
PLoS Comput Biol ; 16(3): e1007665, 2020 03.
Artigo em Inglês | MEDLINE | ID: mdl-32176694

RESUMO

With the improvement of -omics and next-generation sequencing (NGS) methodologies, along with the lowered cost of generating these types of data, the analysis of high-throughput biological data has become standard both for forming and testing biomedical hypotheses. Our knowledge of how to normalize datasets to remove latent undesirable variances has grown extensively, making for standardized data that are easily compared between studies. Here we present the CAncer bioMarker Prediction Pipeline (CAMPP), an open-source R-based wrapper (https://github.com/ELELAB/CAncer-bioMarker-Prediction-Pipeline -CAMPP) intended to aid bioinformatic software-users with data analyses. CAMPP is called from a terminal command line and is supported by a user-friendly manual. The pipeline may be run on a local computer and requires little or no knowledge of programming. To avoid issues relating to R-package updates, a renv .lock file is provided to ensure R-package stability. Data-management includes missing value imputation, data normalization, and distributional checks. CAMPP performs (I) k-means clustering, (II) differential expression/abundance analysis, (III) elastic-net regression, (IV) correlation and co-expression network analyses, (V) survival analysis, and (VI) protein-protein/miRNA-gene interaction networks. The pipeline returns tabular files and graphical representations of the results. We hope that CAMPP will assist in streamlining bioinformatic analysis of quantitative biological data, whilst ensuring an appropriate bio-statistical framework.


Assuntos
Biomarcadores Tumorais/análise , Biologia Computacional/métodos , Neoplasias , Software , Análise por Conglomerados , Bases de Dados Factuais , Humanos , Neoplasias/química , Neoplasias/genética , Neoplasias/mortalidade , Interface Usuário-Computador
12.
BMC Bioinformatics ; 20(1): 663, 2019 Dec 12.
Artigo em Inglês | MEDLINE | ID: mdl-31830908

RESUMO

BACKGROUND: Circular DNA has recently been identified across different species including human normal and cancerous tissue, but short-read mappers are unable to align many of the reads crossing circle junctions hence limiting their detection from short-read sequencing data. RESULTS: Here, we propose a new method, Circle-Map that guides the realignment of partially aligned reads using information from discordantly mapped reads to map the short unaligned portions using a probabilistic model. We compared Circle-Map to similar up-to-date methods for circular DNA and RNA detection and we demonstrate how the approach implemented in Circle-Map dramatically increases sensitivity for detection of circular DNA on both simulated and real data while retaining high precision. CONCLUSION: Circle-Map is an easy-to-use command line tool that implements the required pipeline to accurately detect circular DNA from circle enriched next generation sequencing experiments. Circle-Map is implemented in python3.6 and it is freely available at https://github.com/iprada/Circle-Map.


Assuntos
DNA Circular/genética , Nucleotídeos/genética , Alinhamento de Sequência/métodos , Bases de Dados Genéticas , Humanos , Software
13.
Cell ; 175(2): 347-359.e14, 2018 10 04.
Artigo em Inglês | MEDLINE | ID: mdl-30290141

RESUMO

We analyze whole-genome sequencing data from 141,431 Chinese women generated for non-invasive prenatal testing (NIPT). We use these data to characterize the population genetic structure and to investigate genetic associations with maternal and infectious traits. We show that the present day distribution of alleles is a function of both ancient migration and very recent population movements. We reveal novel phenotype-genotype associations, including several replicated associations with height and BMI, an association between maternal age and EMB, and between twin pregnancy and NRG1. Finally, we identify a unique pattern of circulating viral DNA in plasma with high prevalence of hepatitis B and other clinically relevant maternal infections. A GWAS for viral infections identifies an exceptionally strong association between integrated herpesvirus 6 and MOV10L1, which affects piwi-interacting RNA (piRNA) processing and PIWI protein function. These findings demonstrate the great value and potential of accumulating NIPT data for worldwide medical and genetic analyses.


Assuntos
Povo Asiático/genética , Diagnóstico Pré-Natal/métodos , Adulto , Alelos , China , DNA/genética , Etnicidade/genética , Feminino , Frequência do Gene/genética , Testes Genéticos , Variação Genética/genética , Genética Populacional/métodos , Estudo de Associação Genômica Ampla/métodos , Genômica/métodos , Migração Humana , Humanos , Gravidez , Análise de Sequência de DNA
14.
Nat Genet ; 50(7): 1054-1059, 2018 07.
Artigo em Inglês | MEDLINE | ID: mdl-29915429

RESUMO

Genotype estimates from short-read sequencing data are typically based on the alignment of reads to a linear reference, but reads originating from more complex variants (for example, structural variants) often align poorly, resulting in biased genotype estimates. This bias can be mitigated by first collecting a set of candidate variants across discovery methods, individuals and databases, and then realigning the reads to the variants and reference simultaneously. However, this realignment problem has proved computationally difficult. Here, we present a new method (BayesTyper) that uses exact alignment of read k-mers to a graph representation of the reference and variants to efficiently perform unbiased, probabilistic genotyping across the variation spectrum. We demonstrate that BayesTyper generally provides superior variant sensitivity and genotyping accuracy relative to existing methods when used to integrate variants across discovery approaches and individuals. Finally, we demonstrate that including a 'variation-prior' database containing already known variants significantly improves sensitivity.


Assuntos
Variação Genética/genética , Genoma Humano/genética , Genótipo , Sequenciamento de Nucleotídeos em Larga Escala/métodos , Humanos , Análise de Sequência de DNA/métodos
15.
Front Microbiol ; 8: 2140, 2017.
Artigo em Inglês | MEDLINE | ID: mdl-29163426

RESUMO

Xanthan gum, a complex polysaccharide comprising glucose, mannose and glucuronic acid residues, is involved in numerous biotechnological applications in cosmetics, agriculture, pharmaceuticals, food and petroleum industries. Additionally, its oligosaccharides were shown to possess antimicrobial, antioxidant, and few other properties. Yet, despite its extensive usage, little is known about xanthan gum degradation pathways and mechanisms. Thermogutta terrifontis, isolated from a sample of microbial mat developed in a terrestrial hot spring of Kunashir island (Far-East of Russia), was described as the first thermophilic representative of the Planctomycetes phylum. It grows well on xanthan gum either at aerobic or anaerobic conditions. Genomic analysis unraveled the pathways of oligo- and polysaccharides utilization, as well as the mechanisms of aerobic and anaerobic respiration. The combination of genomic and transcriptomic approaches suggested a novel xanthan gum degradation pathway which involves novel glycosidase(s) of DUF1080 family, hydrolyzing xanthan gum backbone beta-glucosidic linkages and beta-mannosidases instead of xanthan lyases, catalyzing cleavage of terminal beta-mannosidic linkages. Surprisingly, the genes coding DUF1080 proteins were abundant in T. terrifontis and in many other Planctomycetes genomes, which, together with our observation that xanthan gum being a selective substrate for many planctomycetes, suggest crucial role of DUF1080 in xanthan gum degradation. Our findings shed light on the metabolism of the first thermophilic planctomycete, capable to degrade a number of polysaccharides, either aerobically or anaerobically, including the biotechnologically important bacterial polysaccharide xanthan gum.

17.
Eur Heart J Qual Care Clin Outcomes ; 3(2): 114-122, 2017 04 01.
Artigo em Inglês | MEDLINE | ID: mdl-28927171

RESUMO

Aims: Registries have the potential to capture treatment practices and outcomes in populations beyond the constraints of clinical trial settings. The value of data obtained depend critically upon robust quality standards (including source data verification [SDV] and training); features that are commonly absent from registries. This article outlines the quality standards developed for Global Anticoagulant Registry in the FIELD-Atrial Fibrillation (GARFIELD-AF). Methods and Results: GARFIELD-AF comprises ∼57 000 patients prospectively recruited over 6.5 years in 35 countries in five successive cohorts. The registry employs a combination of remote and onsite monitoring to ascertain completeness and accuracy of records and by design, SDV is performed on 20% of cases (i.e. ∼11 400 patients). Four performance measures for ranking sites according to data quality and other performance indicators were evaluated (including data quality for 13 quantifiable variables, late data locking, number of missing critical variables, and history of poor data quality from the previous monitoring phase). These criteria facilitated the identification of sites with potentially suboptimal data quality for onsite monitoring. During early phases of the registry, critical variables for data checking were also identified. SDV using these variables (partial SDV in 902 patients) showed similar concordance to SDV of all fields (110 patients): 94.4% vs. 93.1%, respectively. This standard formed the baseline against which ongoing quality improvements were assessed, facilitating corrective action on data quality issues. In consequence, concordance was improved in the next monitoring phase (95.6%; n = 1172). Conclusion: The quality standards in GARFIELD-AF have the potential to inform a future 'reference' for registries.


Assuntos
Anticoagulantes/uso terapêutico , Fibrilação Atrial/tratamento farmacológico , Confiabilidade dos Dados , Sistema de Registros/normas , Acidente Vascular Cerebral/prevenção & controle , Humanos , Estudos Prospectivos , Fatores de Risco
18.
Nature ; 548(7665): 87-91, 2017 08 03.
Artigo em Inglês | MEDLINE | ID: mdl-28746312

RESUMO

Hundreds of thousands of human genomes are now being sequenced to characterize genetic variation and use this information to augment association mapping studies of complex disorders and other phenotypic traits. Genetic variation is identified mainly by mapping short reads to the reference genome or by performing local assembly. However, these approaches are biased against discovery of structural variants and variation in the more complex parts of the genome. Hence, large-scale de novo assembly is needed. Here we show that it is possible to construct excellent de novo assemblies from high-coverage sequencing with mate-pair libraries extending up to 20 kilobases. We report de novo assemblies of 150 individuals (50 trios) from the GenomeDenmark project. The quality of these assemblies is similar to those obtained using the more expensive long-read technology. We use the assemblies to identify a rich set of structural variants including many novel insertions and demonstrate how this variant catalogue enables further deciphering of known association mapping signals. We leverage the assemblies to provide 100 completely resolved major histocompatibility complex haplotypes and to resolve major parts of the Y chromosome. Our study provides a regional reference genome that we expect will improve the power of future association mapping studies and hence pave the way for precision medicine initiatives, which now are being launched in many countries including Denmark.


Assuntos
Variação Genética/genética , Genética Populacional/normas , Genoma Humano/genética , Genômica/normas , Análise de Sequência de DNA/normas , Adulto , Alelos , Criança , Cromossomos Humanos Y/genética , Dinamarca , Feminino , Haplótipos/genética , Humanos , Complexo Principal de Histocompatibilidade/genética , Masculino , Idade Materna , Taxa de Mutação , Idade Paterna , Mutação Puntual/genética , Padrões de Referência
19.
PLoS Comput Biol ; 13(4): e1005460, 2017 04.
Artigo em Inglês | MEDLINE | ID: mdl-28410363

RESUMO

Post-transcriptional regulation is regarded as one of the major processes involved in the regulation of gene expression. It is mainly performed by RNA binding proteins and microRNAs, which target RNAs and typically affect their stability. Recent efforts from the scientific community have aimed at understanding post-transcriptional regulation at a global scale by using high-throughput sequencing techniques such as cross-linking and immunoprecipitation (CLIP), which facilitates identification of binding sites of these regulatory factors. However, the diversity in the experimental procedures and bioinformatics analyses has hindered the integration of multiple datasets and thus limited the development of an integrated view of post-transcriptional regulation. In this work, we have performed a comprehensive analysis of 107 CLIP datasets from 49 different RBPs in HEK293 cells to shed light on the complex interactions that govern post-transcriptional regulation. By developing a more stringent CLIP analysis pipeline we have discovered the existence of conserved regulatory AU-rich regions in the 3'UTRs where miRNAs and RBPs that regulate several processes such as polyadenylation or mRNA stability bind. Analogous to promoters, many factors have binding sites overlapping or in close proximity in these hotspots and hence the regulation of the mRNA may depend on their relative concentrations. This hypothesis is supported by RBP knockdown experiments that alter the relative concentration of RBPs in the cell. Upon AGO2 knockdown (KD), transcripts containing "free" target sites show increased expression levels compared to those containing target sites in hotspots, which suggests that target sites within hotspots are less available for miRNAs to bind. Interestingly, these hotspots appear enriched in genes with regulatory functions such as DNA binding and RNA binding. Taken together, our results suggest that hotspots are functional regulatory elements that define an extra layer of regulation of post-transcriptional regulatory networks.


Assuntos
Regiões 3' não Traduzidas/genética , Sítios de Ligação/genética , MicroRNAs/genética , Proteínas de Ligação a RNA/genética , Biologia Computacional , Células HEK293 , Humanos , Imunoprecipitação , MicroRNAs/metabolismo , Poliadenilação/genética , Proteínas de Ligação a RNA/metabolismo
20.
PLoS One ; 11(5): e0155039, 2016.
Artigo em Inglês | MEDLINE | ID: mdl-27213950

RESUMO

INTRODUCTION: Infusion of glyceryl trinitrate (GTN), a donor of nitric oxide, induces immediate headache in humans that in migraineurs is followed by a delayed migraine attack. In order to achieve increased knowledge of mechanisms activated during GTN-infusion this present study aims to investigate transcriptional responses to GTN-infusion in the rat trigeminal ganglia. METHODS: Rats were infused with GTN or vehicle and trigeminal ganglia were isolated either 30 or 90 minutes post infusion. RNA sequencing was used to investigate transcriptomic changes in response to the treatment. Furthermore, we developed a novel method for Gene Set Analysis Of Variance (GSANOVA) to identify gene sets associated with transcriptional changes across time. RESULTS: 15 genes displayed significant changes in transcription levels in response to GTN-infusion. Ten of these genes showed either sustained up- or down-regulation in the 90-minute period after infusion. The GSANOVA analysis demonstrate enrichment of pathways pointing towards an increase in immune response, signal transduction, and neuroplasticity in response to GTN-infusion. Future functional in-depth studies of these mechanisms are expected to increase our understanding of migraine pathogenesis.


Assuntos
Transtornos de Enxaqueca/induzido quimicamente , Transtornos de Enxaqueca/genética , Nitroglicerina/efeitos adversos , Gânglio Trigeminal/efeitos dos fármacos , Gânglio Trigeminal/metabolismo , Vasodilatadores/efeitos adversos , Animais , Modelos Animais de Doenças , Perfilação da Expressão Gênica , Regulação da Expressão Gênica/efeitos dos fármacos , Infusões Intraventriculares , Masculino , Transtornos de Enxaqueca/metabolismo , Nitroglicerina/administração & dosagem , Ratos , Ratos Sprague-Dawley , Análise de Sequência de RNA , Vasodilatadores/administração & dosagem
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA