Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 20 de 100
Filter
Add more filters

Publication year range
1.
Cell ; 175(2): 347-359.e14, 2018 10 04.
Article in English | MEDLINE | ID: mdl-30290141

ABSTRACT

We analyze whole-genome sequencing data from 141,431 Chinese women generated for non-invasive prenatal testing (NIPT). We use these data to characterize the population genetic structure and to investigate genetic associations with maternal and infectious traits. We show that the present day distribution of alleles is a function of both ancient migration and very recent population movements. We reveal novel phenotype-genotype associations, including several replicated associations with height and BMI, an association between maternal age and EMB, and between twin pregnancy and NRG1. Finally, we identify a unique pattern of circulating viral DNA in plasma with high prevalence of hepatitis B and other clinically relevant maternal infections. A GWAS for viral infections identifies an exceptionally strong association between integrated herpesvirus 6 and MOV10L1, which affects piwi-interacting RNA (piRNA) processing and PIWI protein function. These findings demonstrate the great value and potential of accumulating NIPT data for worldwide medical and genetic analyses.


Subject(s)
Asian People/genetics , Prenatal Diagnosis/methods , Adult , Alleles , China , DNA/genetics , Ethnicity/genetics , Female , Gene Frequency/genetics , Genetic Testing , Genetic Variation/genetics , Genetics, Population/methods , Genome-Wide Association Study/methods , Genomics/methods , Human Migration , Humans , Pregnancy , Sequence Analysis, DNA
2.
Bioinformatics ; 39(9)2023 09 02.
Article in English | MEDLINE | ID: mdl-37572301

ABSTRACT

MOTIVATION: Learning low-dimensional representations of single-cell transcriptomics has become instrumental to its downstream analysis. The state of the art is currently represented by neural network models, such as variational autoencoders, which use a variational approximation of the likelihood for inference. RESULTS: We here present the Deep Generative Decoder (DGD), a simple generative model that computes model parameters and representations directly via maximum a posteriori estimation. The DGD handles complex parameterized latent distributions naturally unlike variational autoencoders, which typically use a fixed Gaussian distribution, because of the complexity of adding other types. We first show its general functionality on a commonly used benchmark set, Fashion-MNIST. Secondly, we apply the model to multiple single-cell datasets. Here, the DGD learns low-dimensional, meaningful, and well-structured latent representations with sub-clustering beyond the provided labels. The advantages of this approach are its simplicity and its capability to provide representations of much smaller dimensionality than a comparable variational autoencoder. AVAILABILITY AND IMPLEMENTATION: scDGD is available as a python package at https://github.com/Center-for-Health-Data-Science/scDGD. The remaining code is made available here: https://github.com/Center-for-Health-Data-Science/dgd.


Subject(s)
Neural Networks, Computer , RNA , Gene Expression Profiling , Probability , Normal Distribution , Single-Cell Analysis
3.
Nature ; 548(7665): 87-91, 2017 08 03.
Article in English | MEDLINE | ID: mdl-28746312

ABSTRACT

Hundreds of thousands of human genomes are now being sequenced to characterize genetic variation and use this information to augment association mapping studies of complex disorders and other phenotypic traits. Genetic variation is identified mainly by mapping short reads to the reference genome or by performing local assembly. However, these approaches are biased against discovery of structural variants and variation in the more complex parts of the genome. Hence, large-scale de novo assembly is needed. Here we show that it is possible to construct excellent de novo assemblies from high-coverage sequencing with mate-pair libraries extending up to 20 kilobases. We report de novo assemblies of 150 individuals (50 trios) from the GenomeDenmark project. The quality of these assemblies is similar to those obtained using the more expensive long-read technology. We use the assemblies to identify a rich set of structural variants including many novel insertions and demonstrate how this variant catalogue enables further deciphering of known association mapping signals. We leverage the assemblies to provide 100 completely resolved major histocompatibility complex haplotypes and to resolve major parts of the Y chromosome. Our study provides a regional reference genome that we expect will improve the power of future association mapping studies and hence pave the way for precision medicine initiatives, which now are being launched in many countries including Denmark.


Subject(s)
Genetic Variation/genetics , Genetics, Population/standards , Genome, Human/genetics , Genomics/standards , Sequence Analysis, DNA/standards , Adult , Alleles , Child , Chromosomes, Human, Y/genetics , Denmark , Female , Haplotypes/genetics , Humans , Major Histocompatibility Complex/genetics , Male , Maternal Age , Mutation Rate , Paternal Age , Point Mutation/genetics , Reference Standards
4.
BMC Genomics ; 23(1): 87, 2022 Jan 31.
Article in English | MEDLINE | ID: mdl-35100973

ABSTRACT

BACKGROUND: Genomic DNA has been shaped by mutational processes through evolution. The cellular machinery for error correction and repair has left its marks in the nucleotide composition along with structural and functional constraints. Therefore, the probability of observing a base in a certain position in the human genome is highly context-dependent. RESULTS: Here we develop context-dependent nucleotide models. We first investigate models of nucleotides conditioned on sequence context. We develop a bidirectional Markov model that use an average of the probability from a Markov model applied to both strands of the sequence and thus depends on up to 14 bases to each side of the nucleotide. We show how the genome predictability varies across different types of genomic regions. Surprisingly, this model can predict a base from its context with an average of more than 50% accuracy. For somatic variants we show a tendency towards higher probability for the variant base than for the reference base. Inspired by DNA substitution models, we develop a model of mutability that estimates a mutation matrix (called the alpha matrix) on top of the nucleotide distribution. The alpha matrix can be estimated from a much smaller context than the nucleotide model, but the final model will still depend on the full context of the nucleotide model. With the bidirectional Markov model of order 14 and an alpha matrix dependent on just one base to each side, we obtain a model that compares well with a model of mutability that estimates mutation probabilities directly conditioned on three nucleotides to each side. For somatic variants in particular, our model fits better than the simpler model. Interestingly, the model is not very sensitive to the size of the context for the alpha matrix. CONCLUSIONS: Our study found strong context dependencies of nucleotides in the human genome. The best model uses a context of 14 nucleotides to each side. Based on these models, a substitution model was constructed that separates into the context model and a matrix dependent on a small context. The model fit somatic variants particularly well.


Subject(s)
DNA , Nucleotides , DNA/genetics , Genome, Human , Genomics , Humans , Nucleotides/genetics , Probability
5.
Nucleic Acids Res ; 48(16): e93, 2020 09 18.
Article in English | MEDLINE | ID: mdl-32633756

ABSTRACT

Characterizing species diversity and composition of bacteria hosted by biota is revolutionizing our understanding of the role of symbiotic interactions in ecosystems. Determining microbiomes diversity implies the assignment of individual reads to taxa by comparison to reference databases. Although computational methods aimed at identifying the microbe(s) taxa are available, it is well known that inferences using different methods can vary widely depending on various biases. In this study, we first apply and compare different bioinformatics methods based on 16S ribosomal RNA gene and shotgun sequencing to three mock communities of bacteria, of which the compositions are known. We show that none of these methods can infer both the true number of taxa and their abundances. We thus propose a novel approach, named Core-Kaiju, which combines the power of shotgun metagenomics data with a more focused marker gene classification method similar to 16S, but based on emergent statistics of core protein domain families. We thus test the proposed method on various mock communities and we show that Core-Kaiju reliably predicts both number of taxa and abundances. Finally, we apply our method on human gut samples, showing how Core-Kaiju may give more accurate ecological characterization and a fresh view on real microbiomes.


Subject(s)
Bacteria/classification , Gastrointestinal Microbiome/genetics , Metagenome , Metagenomics/methods , Phylogeny , RNA, Ribosomal, 16S/genetics , Bacteria/genetics , Computational Biology , DNA, Bacterial/genetics , Databases, Protein , Genetic Markers , Humans , Sequence Analysis, DNA
6.
PLoS Comput Biol ; 16(3): e1007665, 2020 03.
Article in English | MEDLINE | ID: mdl-32176694

ABSTRACT

With the improvement of -omics and next-generation sequencing (NGS) methodologies, along with the lowered cost of generating these types of data, the analysis of high-throughput biological data has become standard both for forming and testing biomedical hypotheses. Our knowledge of how to normalize datasets to remove latent undesirable variances has grown extensively, making for standardized data that are easily compared between studies. Here we present the CAncer bioMarker Prediction Pipeline (CAMPP), an open-source R-based wrapper (https://github.com/ELELAB/CAncer-bioMarker-Prediction-Pipeline -CAMPP) intended to aid bioinformatic software-users with data analyses. CAMPP is called from a terminal command line and is supported by a user-friendly manual. The pipeline may be run on a local computer and requires little or no knowledge of programming. To avoid issues relating to R-package updates, a renv .lock file is provided to ensure R-package stability. Data-management includes missing value imputation, data normalization, and distributional checks. CAMPP performs (I) k-means clustering, (II) differential expression/abundance analysis, (III) elastic-net regression, (IV) correlation and co-expression network analyses, (V) survival analysis, and (VI) protein-protein/miRNA-gene interaction networks. The pipeline returns tabular files and graphical representations of the results. We hope that CAMPP will assist in streamlining bioinformatic analysis of quantitative biological data, whilst ensuring an appropriate bio-statistical framework.


Subject(s)
Biomarkers, Tumor/analysis , Computational Biology/methods , Neoplasms , Software , Cluster Analysis , Databases, Factual , Humans , Neoplasms/chemistry , Neoplasms/genetics , Neoplasms/mortality , User-Computer Interface
7.
Entropy (Basel) ; 23(11)2021 Oct 25.
Article in English | MEDLINE | ID: mdl-34828101

ABSTRACT

Autoencoders are commonly used in representation learning. They consist of an encoder and a decoder, which provide a straightforward method to map n-dimensional data in input space to a lower m-dimensional representation space and back. The decoder itself defines an m-dimensional manifold in input space. Inspired by manifold learning, we showed that the decoder can be trained on its own by learning the representations of the training samples along with the decoder weights using gradient descent. A sum-of-squares loss then corresponds to optimizing the manifold to have the smallest Euclidean distance to the training samples, and similarly for other loss functions. We derived expressions for the number of samples needed to specify the encoder and decoder and showed that the decoder generally requires much fewer training samples to be well-specified compared to the encoder. We discuss the training of autoencoders in this perspective and relate it to previous work in the field that uses noisy training examples and other types of regularization. On the natural image data sets MNIST and CIFAR10, we demonstrated that the decoder is much better suited to learn a low-dimensional representation, especially when trained on small data sets. Using simulated gene regulatory data, we further showed that the decoder alone leads to better generalization and meaningful representations. Our approach of training the decoder alone facilitates representation learning even on small data sets and can lead to improved training of autoencoders. We hope that the simple analyses presented will also contribute to an improved conceptual understanding of representation learning.

8.
Breast Cancer Res ; 22(1): 73, 2020 06 30.
Article in English | MEDLINE | ID: mdl-32605588

ABSTRACT

BACKGROUND: Studies on tumor-secreted microRNAs point to a functional role of these in cellular communication and reprogramming of the tumor microenvironment. Uptake of tumor-secreted microRNAs by neighboring cells may result in the silencing of mRNA targets and, in turn, modulation of the transcriptome. Studying miRNAs externalized from tumors could improve cancer patient diagnosis and disease monitoring and help to pinpoint which miRNA-gene interactions are central for tumor properties such as invasiveness and metastasis. METHODS: Using a bioinformatics approach, we analyzed the profiles of secreted tumor and normal interstitial fluid (IF) microRNAs, from women with breast cancer (BC). We carried out differential abundance analysis (DAA), to obtain miRNAs, which were enriched or depleted in IFs, from patients with different clinical traits. Subsequently, miRNA family enrichment analysis was performed to assess whether any families were over-represented in the specific sets. We identified dysregulated genes in tumor tissues from the same cohort of patients and constructed weighted gene co-expression networks, to extract sets of co-expressed genes and co-abundant miRNAs. Lastly, we integrated miRNAs and mRNAs to obtain interaction networks and supported our findings using prediction tools and cancer gene databases. RESULTS: Network analysis showed co-expressed genes and miRNA regulators, associated with tumor lymphocyte infiltration. All of the genes were involved in immune system processes, and many had previously been associated with cancer immunity. A subset of these, BTLA, CXCL13, IL7R, LAMP3, and LTB, was linked to the presence of tertiary lymphoid structures and high endothelial venules within tumors. Co-abundant tumor interstitial fluid miRNAs within this network, including miR-146a and miR-494, were annotated as negative regulators of immune-stimulatory responses. One co-expression network encompassed differences between BC subtypes. Genes differentially co-expressed between luminal B and triple-negative breast cancer (TNBC) were connected with sphingolipid metabolism and predicted to be co-regulated by miR-23a. Co-expressed genes and TIF miRNAs associated with tumor grade were BTRC, CHST1, miR-10a/b, miR-107, miR-301a, and miR-454. CONCLUSION: Integration of IF miRNAs and mRNAs unveiled networks associated with patient clinicopathological traits, and underlined molecular mechanisms, specific to BC sub-groups. Our results highlight the benefits of an integrative approach to biomarker discovery, placing secreted miRNAs within a biological context.


Subject(s)
Lymphocytes, Tumor-Infiltrating/immunology , MicroRNAs/genetics , Triple Negative Breast Neoplasms/genetics , Biomarkers, Tumor/genetics , Biomarkers, Tumor/immunology , Extracellular Fluid/metabolism , Female , Follow-Up Studies , Gene Expression Profiling , Gene Regulatory Networks , Humans , Lymphocytes, Tumor-Infiltrating/metabolism , MicroRNAs/metabolism , Neoplasm Grading , Receptor, ErbB-2/metabolism , Receptors, Estrogen/metabolism , Receptors, Progesterone/metabolism , Triple Negative Breast Neoplasms/immunology , Triple Negative Breast Neoplasms/pathology , Tumor Microenvironment/genetics , Tumor Microenvironment/immunology
9.
BMC Bioinformatics ; 20(1): 663, 2019 Dec 12.
Article in English | MEDLINE | ID: mdl-31830908

ABSTRACT

BACKGROUND: Circular DNA has recently been identified across different species including human normal and cancerous tissue, but short-read mappers are unable to align many of the reads crossing circle junctions hence limiting their detection from short-read sequencing data. RESULTS: Here, we propose a new method, Circle-Map that guides the realignment of partially aligned reads using information from discordantly mapped reads to map the short unaligned portions using a probabilistic model. We compared Circle-Map to similar up-to-date methods for circular DNA and RNA detection and we demonstrate how the approach implemented in Circle-Map dramatically increases sensitivity for detection of circular DNA on both simulated and real data while retaining high precision. CONCLUSION: Circle-Map is an easy-to-use command line tool that implements the required pipeline to accurately detect circular DNA from circle enriched next generation sequencing experiments. Circle-Map is implemented in python3.6 and it is freely available at https://github.com/iprada/Circle-Map.


Subject(s)
DNA, Circular/genetics , Nucleotides/genetics , Sequence Alignment/methods , Databases, Genetic , Humans , Software
10.
Nature ; 499(7456): 74-8, 2013 Jul 04.
Article in English | MEDLINE | ID: mdl-23803765

ABSTRACT

The rich fossil record of equids has made them a model for evolutionary processes. Here we present a 1.12-times coverage draft genome from a horse bone recovered from permafrost dated to approximately 560-780 thousand years before present (kyr BP). Our data represent the oldest full genome sequence determined so far by almost an order of magnitude. For comparison, we sequenced the genome of a Late Pleistocene horse (43 kyr BP), and modern genomes of five domestic horse breeds (Equus ferus caballus), a Przewalski's horse (E. f. przewalskii) and a donkey (E. asinus). Our analyses suggest that the Equus lineage giving rise to all contemporary horses, zebras and donkeys originated 4.0-4.5 million years before present (Myr BP), twice the conventionally accepted time to the most recent common ancestor of the genus Equus. We also find that horse population size fluctuated multiple times over the past 2 Myr, particularly during periods of severe climatic changes. We estimate that the Przewalski's and domestic horse populations diverged 38-72 kyr BP, and find no evidence of recent admixture between the domestic horse breeds and the Przewalski's horse investigated. This supports the contention that Przewalski's horses represent the last surviving wild horse population. We find similar levels of genetic variation among Przewalski's and domestic populations, indicating that the former are genetically viable and worthy of conservation efforts. We also find evidence for continuous selection on the immune system and olfaction throughout horse evolution. Finally, we identify 29 genomic regions among horse breeds that deviate from neutrality and show low levels of genetic variation compared to the Przewalski's horse. Such regions could correspond to loci selected early during domestication.


Subject(s)
Evolution, Molecular , Genome/genetics , Horses/genetics , Phylogeny , Animals , Conservation of Natural Resources , DNA/analysis , DNA/genetics , Endangered Species , Equidae/classification , Equidae/genetics , Fossils , Genetic Variation/genetics , History, Ancient , Horses/classification , Proteins/analysis , Proteins/chemistry , Proteins/genetics , Yukon Territory
11.
PLoS Comput Biol ; 13(4): e1005460, 2017 04.
Article in English | MEDLINE | ID: mdl-28410363

ABSTRACT

Post-transcriptional regulation is regarded as one of the major processes involved in the regulation of gene expression. It is mainly performed by RNA binding proteins and microRNAs, which target RNAs and typically affect their stability. Recent efforts from the scientific community have aimed at understanding post-transcriptional regulation at a global scale by using high-throughput sequencing techniques such as cross-linking and immunoprecipitation (CLIP), which facilitates identification of binding sites of these regulatory factors. However, the diversity in the experimental procedures and bioinformatics analyses has hindered the integration of multiple datasets and thus limited the development of an integrated view of post-transcriptional regulation. In this work, we have performed a comprehensive analysis of 107 CLIP datasets from 49 different RBPs in HEK293 cells to shed light on the complex interactions that govern post-transcriptional regulation. By developing a more stringent CLIP analysis pipeline we have discovered the existence of conserved regulatory AU-rich regions in the 3'UTRs where miRNAs and RBPs that regulate several processes such as polyadenylation or mRNA stability bind. Analogous to promoters, many factors have binding sites overlapping or in close proximity in these hotspots and hence the regulation of the mRNA may depend on their relative concentrations. This hypothesis is supported by RBP knockdown experiments that alter the relative concentration of RBPs in the cell. Upon AGO2 knockdown (KD), transcripts containing "free" target sites show increased expression levels compared to those containing target sites in hotspots, which suggests that target sites within hotspots are less available for miRNAs to bind. Interestingly, these hotspots appear enriched in genes with regulatory functions such as DNA binding and RNA binding. Taken together, our results suggest that hotspots are functional regulatory elements that define an extra layer of regulation of post-transcriptional regulatory networks.


Subject(s)
3' Untranslated Regions/genetics , Binding Sites/genetics , MicroRNAs/genetics , RNA-Binding Proteins/genetics , Computational Biology , HEK293 Cells , Humans , Immunoprecipitation , MicroRNAs/metabolism , Polyadenylation/genetics , RNA-Binding Proteins/metabolism
13.
Genome Res ; 24(3): 454-66, 2014 Mar.
Article in English | MEDLINE | ID: mdl-24299735

ABSTRACT

Epigenetic information is available from contemporary organisms, but is difficult to track back in evolutionary time. Here, we show that genome-wide epigenetic information can be gathered directly from next-generation sequence reads of DNA isolated from ancient remains. Using the genome sequence data generated from hair shafts of a 4000-yr-old Paleo-Eskimo belonging to the Saqqaq culture, we generate the first ancient nucleosome map coupled with a genome-wide survey of cytosine methylation levels. The validity of both nucleosome map and methylation levels were confirmed by the recovery of the expected signals at promoter regions, exon/intron boundaries, and CTCF sites. The top-scoring nucleosome calls revealed distinct DNA positioning biases, attesting to nucleotide-level accuracy. The ancient methylation levels exhibited high conservation over time, clustering closely with modern hair tissues. Using ancient methylation information, we estimated the age at death of the Saqqaq individual and illustrate how epigenetic information can be used to infer ancient gene expression. Similar epigenetic signatures were found in other fossil material, such as 110,000- to 130,000-yr-old bones, supporting the contention that ancient epigenomic information can be reconstructed from a deep past. Our findings lay the foundation for extracting epigenomic information from ancient samples, allowing shifts in epialleles to be tracked through evolutionary time, as well as providing an original window into modern epigenomics.


Subject(s)
Cytosine/metabolism , DNA Methylation , Genome, Human , Inuit/genetics , Nucleosomes/genetics , Animals , Chromosome Mapping , Epigenesis, Genetic , Epigenomics , Evolution, Molecular , Gene Expression , Gene Expression Regulation , Humans , Phylogeny , Promoter Regions, Genetic , Sequence Analysis, DNA
14.
RNA ; 21(5): 1042-52, 2015 May.
Article in English | MEDLINE | ID: mdl-25805860

ABSTRACT

Selective 2' Hydroxyl Acylation analyzed by Primer Extension (SHAPE) is an accurate method for probing of RNA secondary structure. In existing SHAPE methods, the SHAPE probing signal is normalized to a no-reagent control to correct for the background caused by premature termination of the reverse transcriptase. Here, we introduce a SHAPE Selection (SHAPES) reagent, N-propanone isatoic anhydride (NPIA), which retains the ability of SHAPE reagents to accurately probe RNA structure, but also allows covalent coupling between the SHAPES reagent and a biotin molecule. We demonstrate that SHAPES-based selection of cDNA-RNA hybrids on streptavidin beads effectively removes the large majority of background signal present in SHAPE probing data and that sequencing-based SHAPES data contain the same amount of RNA structure data as regular sequencing-based SHAPE data obtained through normalization to a no-reagent control. Moreover, the selection efficiently enriches for probed RNAs, suggesting that the SHAPES strategy will be useful for applications with high-background and low-probing signal such as in vivo RNA structure probing.


Subject(s)
Hydroxyl Radical/chemistry , Nucleic Acid Conformation , RNA Probes/chemistry , RNA/chemistry , Sequence Analysis, RNA/methods , Acylation , Bacillus subtilis/genetics , Biotin/chemistry , Escherichia coli/genetics , Hydroxyl Radical/metabolism , RNA/analysis , RNA Caps/chemistry , RNA, Bacterial/chemistry , RNA, Ribosomal, 16S/chemistry , Ribonuclease P/genetics , Transcription Initiation Site
15.
Nucleic Acids Res ; 43(13): 6207-21, 2015 07 27.
Article in English | MEDLINE | ID: mdl-26089393

ABSTRACT

We report a high-resolution time series study of transcriptome dynamics following antimiR-mediated inhibition of miR-9 in a Hodgkin lymphoma cell-line-the first such dynamic study of the microRNA inhibition response-revealing both general and specific aspects of the physiological response. We show miR-9 inhibition inducing a multiphasic transcriptome response, with a direct target perturbation before 4 h, earlier than previously reported, amplified by a downstream peak at ∼32 h consistent with an indirect response due to secondary coherent regulation. Predictive modelling indicates a major role for miR-9 in post-transcriptional control of RNA processing and RNA binding protein regulation. Cluster analysis identifies multiple co-regulated gene regulatory modules. Functionally, we observe a shift over time from mRNA processing at early time points to translation at later time points. We validate the key observations with independent time series qPCR and we experimentally validate key predicted miR-9 targets. Methodologically, we developed sensitive functional data analytic predictive methods to analyse the weak response inherent in microRNA inhibition experiments. The methods of this study will be applicable to similar high-resolution time series transcriptome analyses and provides the context for more accurate experimental design and interpretation of future microRNA inhibition studies.


Subject(s)
Gene Expression Regulation , MicroRNAs/antagonists & inhibitors , Transcriptome , Cell Line, Tumor , Cluster Analysis , Genomics , Humans , Models, Genetic , RNA Processing, Post-Transcriptional , RNA-Binding Proteins/metabolism
16.
Environ Microbiol ; 18(3): 863-74, 2016 Mar.
Article in English | MEDLINE | ID: mdl-26439881

ABSTRACT

Limited by culture-dependent methods the number of viruses identified from thermophilic Archaea and Bacteria is still very small. In this study we retrieved viral sequences from six hot spring metagenomes isolated worldwide, revealing a wide distribution of four archaeal viral families, Ampullaviridae, Bicaudaviridae, Lipothrixviridae and Rudiviridae. Importantly, we identified 10 complete or near complete viral genomes allowing, for the first time, an assessment of genome conservation and evolution of the Ampullaviridae family as well as Sulfolobus Monocaudavirus 1 (SMV1)-related viruses. Among the novel genomes, one belongs to a putative thermophilic virus infecting the bacterium Hydrogenobaculum, for which no virus has been reported in the literature. Moreover, a high viral diversity was observed in the metagenomes, especially among the Lipothrixviridae, as indicated by the large number of unique contigs and the lack of a completely assembled genome for this family. This is further supported by the large number of novel genes in the complete and partial genomes showing no sequence similarities to public databases. CRISPR analysis revealed hundreds of novel CRISPR loci and thousands of novel CRISPR spacers from each metagenome, reinforcing the notion of high viral diversity in the thermal environment.


Subject(s)
Archaeal Viruses/genetics , Genome, Viral , Hot Springs/microbiology , Metagenome , Biodiversity
17.
Blood ; 123(6): 894-904, 2014 Feb 06.
Article in English | MEDLINE | ID: mdl-24363398

ABSTRACT

Gene expression profiling has been used extensively to characterize cancer, identify novel subtypes, and improve patient stratification. However, it has largely failed to identify transcriptional programs that differ between cancer and corresponding normal cells and has not been efficient in identifying expression changes fundamental to disease etiology. Here we present a method that facilitates the comparison of any cancer sample to its nearest normal cellular counterpart, using acute myeloid leukemia (AML) as a model. We first generated a gene expression-based landscape of the normal hematopoietic hierarchy, using expression profiles from normal stem/progenitor cells, and next mapped the AML patient samples to this landscape. This allowed us to identify the closest normal counterpart of individual AML samples and determine gene expression changes between cancer and normal. We find the cancer vs normal method (CvN method) to be superior to conventional methods in stratifying AML patients with aberrant karyotype and in identifying common aberrant transcriptional programs with potential importance for AML etiology. Moreover, the CvN method uncovered a novel poor-outcome subtype of normal-karyotype AML, which allowed for the generation of a highly prognostic survival signature. Collectively, our CvN method holds great potential as a tool for the analysis of gene expression profiles of cancer patients.


Subject(s)
Biomarkers, Tumor/genetics , Hematopoietic Stem Cells/metabolism , Leukemia, Myeloid, Acute/genetics , Blotting, Western , Case-Control Studies , Follow-Up Studies , Gene Expression Profiling , Humans , Leukemia, Myeloid, Acute/pathology , Oligonucleotide Array Sequence Analysis , Prognosis , RNA, Messenger/genetics , Real-Time Polymerase Chain Reaction , Reverse Transcriptase Polymerase Chain Reaction , Survival Rate
18.
Nature ; 463(7282): 757-62, 2010 Feb 11.
Article in English | MEDLINE | ID: mdl-20148029

ABSTRACT

We report here the genome sequence of an ancient human. Obtained from approximately 4,000-year-old permafrost-preserved hair, the genome represents a male individual from the first known culture to settle in Greenland. Sequenced to an average depth of 20x, we recover 79% of the diploid genome, an amount close to the practical limit of current sequencing technologies. We identify 353,151 high-confidence single-nucleotide polymorphisms (SNPs), of which 6.8% have not been reported previously. We estimate raw read contamination to be no higher than 0.8%. We use functional SNP assessment to assign possible phenotypic characteristics of the individual that belonged to a culture whose location has yielded only trace human remains. We compare the high-confidence SNPs to those of contemporary populations to find the populations most closely related to the individual. This provides evidence for a migration from Siberia into the New World some 5,500 years ago, independent of that giving rise to the modern Native Americans and Inuit.


Subject(s)
Cryopreservation , Extinction, Biological , Genome, Human/genetics , Inuit/genetics , Emigration and Immigration/history , Genetics, Population , Genomics , Genotype , Greenland , Hair , History, Ancient , Humans , Male , Phenotype , Phylogeny , Polymorphism, Single Nucleotide/genetics , Sequence Analysis, DNA , Siberia/ethnology
19.
PLoS Genet ; 9(10): e1003913, 2013 Oct.
Article in English | MEDLINE | ID: mdl-24204315

ABSTRACT

miRNAs are small regulatory RNAs that, due to their considerable potential to target a wide range of mRNAs, are implicated in essentially all biological process, including cancer. miR-10a is particularly interesting considering its conserved location in the Hox cluster of developmental regulators. A role for this microRNA has been described in developmental regulation as well as for various cancers. However, previous miR-10a studies are exclusively based on transient knockdowns of this miRNA and to extensively study miR-10a loss we have generated a miR-10a knock out mouse. Here we show that, in the Apc(min) mouse model of intestinal neoplasia, female miR-10a deficient mice develop significantly more adenomas than miR-10(+/+) and male controls. We further found that Lpo is extensively upregulated in the intestinal epithelium of mice deprived of miR-10a. Using in vitro assays, we demonstrate that the primary miR-10a target KLF4 can upregulate transcription of Lpo, whereas siRNA knockdown of KLF4 reduces LPO levels in HCT-116 cells. Furthermore, Klf4 is upregulated in the intestines of miR-10a knockout mice. Lpo has previously been shown to have the capacity to oxidize estrogens into potent depurinating mutagens, creating an instable genomic environment that can cause initiation of cancer. Therefore, we postulate that Lpo upregulation in the intestinal epithelium of miR-10a deficient mice together with the predominant abundance of estrogens in female animals mainly accounts for the sex-related cancer phenotype we observed. This suggests that miR-10a could be used as a potent diagnostic marker for discovering groups of women that are at high risk of developing colorectal carcinoma, which today is one of the leading causes of cancer-related deaths.


Subject(s)
Intestinal Neoplasms/genetics , Kruppel-Like Transcription Factors/biosynthesis , Lactoperoxidase/genetics , MicroRNAs/genetics , Animals , Disease Models, Animal , Female , Gene Expression Regulation, Neoplastic , HCT116 Cells , Humans , Intestinal Neoplasms/pathology , Kruppel-Like Factor 4 , Lactoperoxidase/biosynthesis , Male , Mice , Mice, Knockout , MicroRNAs/metabolism , Wnt Signaling Pathway/genetics
20.
EMBO J ; 30(22): 4628-41, 2011 Sep 13.
Article in English | MEDLINE | ID: mdl-21915098

ABSTRACT

Autophagy is an evolutionarily conserved mechanism of cellular self-digestion in which proteins and organelles are degraded through delivery to lysosomes. Defects in this process are implicated in numerous human diseases including cancer. To further elucidate regulatory mechanisms of autophagy, we performed a functional screen in search of microRNAs (miRNAs), which regulate the autophagic flux in breast cancer cells. In this study, we identified the tumour suppressive miRNA, miR-101, as a potent inhibitor of basal, etoposide- and rapamycin-induced autophagy. Through transcriptome profiling, we identified three novel miR-101 targets, STMN1, RAB5A and ATG4D. siRNA-mediated depletion of these genes phenocopied the effect of miR-101 overexpression, demonstrating their importance in autophagy regulation. Importantly, overexpression of STMN1 could partially rescue cells from miR-101-mediated inhibition of autophagy, indicating a functional importance for this target. Finally, we show that miR-101-mediated inhibition of autophagy can sensitize breast cancer cells to 4-hydroxytamoxifen (4-OHT)-mediated cell death. Collectively, these data establish a novel link between two highly important and rapidly growing research fields and present a new role for miR-101 as a key regulator of autophagy.


Subject(s)
Autophagy , MicroRNAs/genetics , MicroRNAs/metabolism , Stathmin/metabolism , Autophagy-Related Proteins , Breast Neoplasms/genetics , Breast Neoplasms/metabolism , Cell Line, Tumor , Cysteine Endopeptidases/genetics , Cysteine Endopeptidases/metabolism , Etoposide/pharmacology , Female , Gene Expression Profiling , Gene Expression Regulation , Humans , Oligonucleotide Array Sequence Analysis , RNA Interference , RNA, Small Interfering , Sirolimus/pharmacology , Stathmin/biosynthesis , Stathmin/genetics , Tamoxifen/analogs & derivatives , Tamoxifen/pharmacology , rab5 GTP-Binding Proteins/genetics , rab5 GTP-Binding Proteins/metabolism
SELECTION OF CITATIONS
SEARCH DETAIL