Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 20 de 94
Filter
1.
Nat Commun ; 15(1): 5534, 2024 Jul 01.
Article in English | MEDLINE | ID: mdl-38951512

ABSTRACT

Stratified medicine holds great promise to tailor treatment to the needs of individual patients. While genetics holds great potential to aid patient stratification, it remains a major challenge to operationalize complex genetic risk factor profiles to deconstruct clinical heterogeneity. Contemporary approaches to this problem rely on polygenic risk scores (PRS), which provide only limited clinical utility and lack a clear biological foundation. To overcome these limitations, we develop the CASTom-iGEx approach to stratify individuals based on the aggregated impact of their genetic risk factor profiles on tissue specific gene expression levels. The paradigmatic application of this approach to coronary artery disease or schizophrenia patient cohorts identified diverse strata or biotypes. These biotypes are characterized by distinct endophenotype profiles as well as clinical parameters and are fundamentally distinct from PRS based groupings. In stark contrast to the latter, the CASTom-iGEx strategy discovers biologically meaningful and clinically actionable patient subgroups, where complex genetic liabilities are not randomly distributed across individuals but rather converge onto distinct disease relevant biological processes. These results support the notion of different patient biotypes characterized by partially distinct pathomechanisms. Thus, the universally applicable approach presented here has the potential to constitute an important component of future personalized medicine paradigms.


Subject(s)
Coronary Artery Disease , Genetic Predisposition to Disease , Multifactorial Inheritance , Schizophrenia , Humans , Schizophrenia/genetics , Multifactorial Inheritance/genetics , Genetic Predisposition to Disease/genetics , Coronary Artery Disease/genetics , Risk Factors , Female , Precision Medicine , Male , Genome-Wide Association Study , Middle Aged , Polymorphism, Single Nucleotide
2.
HGG Adv ; 5(3): 100318, 2024 Jun 13.
Article in English | MEDLINE | ID: mdl-38872308

ABSTRACT

The high heritability of amyotrophic lateral sclerosis (ALS) contrasts with its low molecular diagnosis rate post-genetic testing, pointing to potential undiscovered genetic factors. To aid the exploration of these factors, we introduced EpiOut, an algorithm to identify chromatin accessibility outliers that are regions exhibiting divergent accessibility from the population baseline in a single or few samples. Annotation of accessible regions with histone chromatin immunoprecipitation sequencing and Hi-C indicates that outliers are concentrated in functional loci, especially among promoters interacting with active enhancers. Across different omics levels, outliers are robustly replicated, and chromatin accessibility outliers are reliable predictors of gene expression outliers and aberrant protein levels. When promoter accessibility does not align with gene expression, our results indicate that molecular aberrations are more likely to be linked to post-transcriptional regulation rather than transcriptional regulation. Our findings demonstrate that the outlier detection paradigm can uncover dysregulated regions in rare diseases. EpiOut is available at github.com/uci-cbcl/EpiOut.

3.
Mol Genet Metab ; 142(3): 108511, 2024 Jul.
Article in English | MEDLINE | ID: mdl-38878498

ABSTRACT

The diagnosis of Mendelian disorders has notably advanced with integration of whole exome and genome sequencing (WES and WGS) in clinical practice. However, challenges in variant interpretation and uncovered variants by WES still leave a substantial percentage of patients undiagnosed. In this context, integrating RNA sequencing (RNA-seq) improves diagnostic workflows, particularly for WES inconclusive cases. Additionally, functional studies are often necessary to elucidate the impact of prioritized variants on gene expression and protein function. Our study focused on three unrelated male patients (P1-P3) with ATP6AP1-CDG (congenital disorder of glycosylation), presenting with intellectual disability and varying degrees of hepatopathy, glycosylation defects, and an initially inconclusive diagnosis through WES. Subsequent RNA-seq was pivotal in identifying the underlying genetic causes in P1 and P2, detecting ATP6AP1 underexpression and aberrant splicing. Molecular studies in fibroblasts confirmed these findings and identified the rare intronic variants c.289-233C > T and c.289-289G > A in P1 and P2, respectively. Trio-WGS also revealed the variant c.289-289G > A in P3, which was a de novo change in both patients. Functional assays expressing the mutant alleles in HAP1 cells demonstrated the pathogenic impact of these variants by reproducing the splicing alterations observed in patients. Our study underscores the role of RNA-seq and WGS in enhancing diagnostic rates for genetic diseases such as CDG, providing new insights into ATP6AP1-CDG molecular bases by identifying the first two deep intronic variants in this X-linked gene. Additionally, our study highlights the need to integrate RNA-seq and WGS, followed by functional validation, in routine diagnostics for a comprehensive evaluation of patients with an unidentified molecular etiology.


Subject(s)
Introns , RNA, Messenger , Humans , Male , Introns/genetics , RNA, Messenger/genetics , Vacuolar Proton-Translocating ATPases/genetics , Congenital Disorders of Glycosylation/genetics , Congenital Disorders of Glycosylation/diagnosis , Congenital Disorders of Glycosylation/pathology , Mutation , Whole Genome Sequencing , Exome Sequencing , Sequence Analysis, RNA , Intellectual Disability/genetics , Intellectual Disability/diagnosis , Intellectual Disability/pathology , Child , RNA Splicing/genetics , Child, Preschool
4.
Genome Med ; 16(1): 70, 2024 05 20.
Article in English | MEDLINE | ID: mdl-38769532

ABSTRACT

BACKGROUND: Rare oncogenic driver events, particularly affecting the expression or splicing of driver genes, are suspected to substantially contribute to the large heterogeneity of hematologic malignancies. However, their identification remains challenging. METHODS: To address this issue, we generated the largest dataset to date of matched whole genome sequencing and total RNA sequencing of hematologic malignancies from 3760 patients spanning 24 disease entities. Taking advantage of our dataset size, we focused on discovering rare regulatory aberrations. Therefore, we called expression and splicing outliers using an extension of the workflow DROP (Detection of RNA Outliers Pipeline) and AbSplice, a variant effect predictor that identifies genetic variants causing aberrant splicing. We next trained a machine learning model integrating these results to prioritize new candidate disease-specific driver genes. RESULTS: We found a median of seven expression outlier genes, two splicing outlier genes, and two rare splice-affecting variants per sample. Each category showed significant enrichment for already well-characterized driver genes, with odds ratios exceeding three among genes called in more than five samples. On held-out data, our integrative modeling significantly outperformed modeling based solely on genomic data and revealed promising novel candidate driver genes. Remarkably, we found a truncated form of the low density lipoprotein receptor LRP1B transcript to be aberrantly overexpressed in about half of hairy cell leukemia variant (HCL-V) samples and, to a lesser extent, in closely related B-cell neoplasms. This observation, which was confirmed in an independent cohort, suggests LRP1B as a novel marker for a HCL-V subclass and a yet unreported functional role of LRP1B within these rare entities. CONCLUSIONS: Altogether, our census of expression and splicing outliers for 24 hematologic malignancy entities and the companion computational workflow constitute unique resources to deepen our understanding of rare oncogenic events in hematologic cancers.


Subject(s)
Hematologic Neoplasms , Transcriptome , Humans , Hematologic Neoplasms/genetics , RNA Splicing , Gene Expression Regulation, Neoplastic , Oncogenes , Gene Expression Profiling , Receptors, LDL/genetics
5.
medRxiv ; 2024 May 04.
Article in English | MEDLINE | ID: mdl-38746462

ABSTRACT

Solve-RD is a pan-European rare disease (RD) research program that aims to identify disease-causing genetic variants in previously undiagnosed RD families. We utilised 10-fold coverage HiFi long-read sequencing (LRS) for detecting causative structural variants (SVs), single nucleotide variants (SNVs), insertion-deletions (InDels), and short tandem repeat (STR) expansions in extensively studied RD families without clear molecular diagnoses. Our cohort includes 293 individuals from 114 genetically undiagnosed RD families selected by European Rare Disease Network (ERN) experts. Of these, 21 families were affected by so-called 'unsolvable' syndromes for which genetic causes remain unknown, and 93 families with at least one individual affected by a rare neurological, neuromuscular, or epilepsy disorder without genetic diagnosis despite extensive prior testing. Clinical interpretation and orthogonal validation of variants in known disease genes yielded thirteen novel genetic diagnoses due to de novo and rare inherited SNVs, InDels, SVs, and STR expansions. In an additional four families, we identified a candidate disease-causing SV affecting several genes including an MCF2 / FGF13 fusion and PSMA3 deletion. However, no common genetic cause was identified in any of the 'unsolvable' syndromes. Taken together, we found (likely) disease-causing genetic variants in 13.0% of previously unsolved families and additional candidate disease-causing SVs in another 4.3% of these families. In conclusion, our results demonstrate the added value of HiFi long-read genome sequencing in undiagnosed rare diseases.

6.
Genome Biol ; 25(1): 83, 2024 04 02.
Article in English | MEDLINE | ID: mdl-38566111

ABSTRACT

BACKGROUND: The rise of large-scale multi-species genome sequencing projects promises to shed new light on how genomes encode gene regulatory instructions. To this end, new algorithms are needed that can leverage conservation to capture regulatory elements while accounting for their evolution. RESULTS: Here, we introduce species-aware DNA language models, which we trained on more than 800 species spanning over 500 million years of evolution. Investigating their ability to predict masked nucleotides from context, we show that DNA language models distinguish transcription factor and RNA-binding protein motifs from background non-coding sequence. Owing to their flexibility, DNA language models capture conserved regulatory elements over much further evolutionary distances than sequence alignment would allow. Remarkably, DNA language models reconstruct motif instances bound in vivo better than unbound ones and account for the evolution of motif sequences and their positional constraints, showing that these models capture functional high-order sequence and evolutionary context. We further show that species-aware training yields improved sequence representations for endogenous and MPRA-based gene expression prediction, as well as motif discovery. CONCLUSIONS: Collectively, these results demonstrate that species-aware DNA language models are a powerful, flexible, and scalable tool to integrate information from large compendia of highly diverged genomes.


Subject(s)
DNA , Regulatory Sequences, Nucleic Acid , Binding Sites , Sequence Alignment , Algorithms , Conserved Sequence/genetics , Evolution, Molecular
7.
Sci Rep ; 14(1): 5768, 2024 03 08.
Article in English | MEDLINE | ID: mdl-38459123

ABSTRACT

The SARS-CoV-2 pandemic has highlighted the need to better define in-hospital transmissions, a need that extends to all other common infectious diseases encountered in clinical settings. To evaluate how whole viral genome sequencing can contribute to deciphering nosocomial SARS-CoV-2 transmission 926 SARS-CoV-2 viral genomes from 622 staff members and patients were collected between February 2020 and January 2021 at a university hospital in Munich, Germany, and analysed along with the place of work, duration of hospital stay, and ward transfers. Bioinformatically defined transmission clusters inferred from viral genome sequencing were compared to those inferred from interview-based contact tracing. An additional dataset collected at the same time at another university hospital in the same city was used to account for multiple independent introductions. Clustering analysis of 619 viral genomes generated 19 clusters ranging from 3 to 31 individuals. Sequencing-based transmission clusters showed little overlap with those based on contact tracing data. The viral genomes were significantly more closely related to each other than comparable genomes collected simultaneously at other hospitals in the same city (n = 829), suggesting nosocomial transmission. Longitudinal sampling from individual patients suggested possible cross-infection events during the hospital stay in 19.2% of individuals (14 of 73 individuals). Clustering analysis of SARS-CoV-2 whole genome sequences can reveal cryptic transmission events missed by classical, interview-based contact tracing, helping to decipher in-hospital transmissions. These results, in line with other studies, advocate for viral genome sequencing as a pathogen transmission surveillance tool in hospitals.


Subject(s)
COVID-19 , Cross Infection , Humans , SARS-CoV-2/genetics , COVID-19/epidemiology , COVID-19/genetics , Genome, Viral/genetics , Cross Infection/epidemiology , Cross Infection/genetics , Hospitals, University
8.
Mol Syst Biol ; 20(5): 506-520, 2024 May.
Article in English | MEDLINE | ID: mdl-38491213

ABSTRACT

Codon optimality is a major determinant of mRNA translation and degradation rates. However, whether and through which mechanisms its effects are regulated remains poorly understood. Here we show that codon optimality associates with up to 2-fold change in mRNA stability variations between human tissues, and that its effect is attenuated in tissues with high energy metabolism and amplifies with age. Mathematical modeling and perturbation data through oxygen deprivation and ATP synthesis inhibition reveal that cellular energy variations non-uniformly alter the effect of codon usage. This new mode of codon effect regulation, independent of tRNA regulation, provides a fundamental mechanistic link between cellular energy metabolism and eukaryotic gene expression.


Subject(s)
Codon , Energy Metabolism , RNA Stability , RNA, Messenger , Humans , Energy Metabolism/genetics , RNA, Messenger/genetics , RNA, Messenger/metabolism , Codon/genetics , Codon Usage , Protein Biosynthesis , RNA, Transfer/genetics , RNA, Transfer/metabolism , Adenosine Triphosphate/metabolism , Gene Expression Regulation
9.
bioRxiv ; 2024 Jan 09.
Article in English | MEDLINE | ID: mdl-38260253

ABSTRACT

Aging and neurodegeneration entail diverse cellular and molecular hallmarks. Here, we studied the effects of aging on the transcriptome, translatome, and multiple layers of the proteome in the brain of a short-lived killifish. We reveal that aging causes widespread reduction of proteins enriched in basic amino acids that is independent of mRNA regulation, and it is not due to impaired proteasome activity. Instead, we identify a cascade of events where aberrant translation pausing leads to reduced ribosome availability resulting in proteome remodeling independently of transcriptional regulation. Our research uncovers a vulnerable point in the aging brain's biology - the biogenesis of basic DNA/RNA binding proteins. This vulnerability may represent a unifying principle that connects various aging hallmarks, encompassing genome integrity and the biosynthesis of macromolecules.

10.
Nat Commun ; 15(1): 151, 2024 Jan 02.
Article in English | MEDLINE | ID: mdl-38167372

ABSTRACT

Unlike for DNA and RNA, accurate and high-throughput sequencing methods for proteins are lacking, hindering the utility of proteomics in applications where the sequences are unknown including variant calling, neoepitope identification, and metaproteomics. We introduce Spectralis, a de novo peptide sequencing method for tandem mass spectrometry. Spectralis leverages several innovations including a convolutional neural network layer connecting peaks in spectra spaced by amino acid masses, proposing fragment ion series classification as a pivotal task for de novo peptide sequencing, and a peptide-spectrum confidence score. On spectra for which database search provided a ground truth, Spectralis surpassed 40% sensitivity at 90% precision, nearly doubling state-of-the-art sensitivity. Application to unidentified spectra confirmed its superiority and showcased its applicability to variant calling. Altogether, these algorithmic innovations and the substantial sensitivity increase in the high-precision range constitute an important step toward broadly applicable peptide sequencing.


Subject(s)
Deep Learning , Algorithms , Sequence Analysis, Protein/methods , Peptides/chemistry , Amino Acid Sequence
11.
Nat Methods ; 21(1): 28-31, 2024 Jan.
Article in English | MEDLINE | ID: mdl-38049697

ABSTRACT

Single-cell ATAC sequencing coverage in regulatory regions is typically binarized as an indicator of open chromatin. Here we show that binarization is an unnecessary step that neither improves goodness of fit, clustering, cell type identification nor batch integration. Fragment counts, but not read counts, should instead be modeled, which preserves quantitative regulatory information. These results have immediate implications for single-cell ATAC sequencing analysis.


Subject(s)
Chromatin Immunoprecipitation Sequencing , High-Throughput Nucleotide Sequencing , Sequence Analysis, DNA/methods , High-Throughput Nucleotide Sequencing/methods , Chromatin/genetics , Single-Cell Analysis
12.
NAR Genom Bioinform ; 5(4): lqad095, 2023 Dec.
Article in English | MEDLINE | ID: mdl-37942285

ABSTRACT

Functional gene embeddings, numerical vectors capturing gene function, provide a promising way to integrate functional gene information into machine learning models. These embeddings are learnt by applying self-supervised machine-learning algorithms on various data types including quantitative omics measurements, protein-protein interaction networks and literature. However, downstream evaluations comparing alternative data modalities used to construct functional gene embeddings have been lacking. Here we benchmarked functional gene embeddings obtained from various data modalities for predicting disease-gene lists, cancer drivers, phenotype-gene associations and scores from genome-wide association studies. Off-the-shelf predictors trained on precomputed embeddings matched or outperformed dedicated state-of-the-art predictors, demonstrating their high utility. Embeddings based on literature and protein-protein interactions inferred from low-throughput experiments outperformed embeddings derived from genome-wide experimental data (transcriptomics, deletion screens and protein sequence) when predicting curated gene lists. In contrast, they did not perform better when predicting genome-wide association signals and were biased towards highly-studied genes. These results indicate that embeddings derived from literature and low-throughput experiments appear favourable in many existing benchmarks because they are biased towards well-studied genes and should therefore be considered with caution. Altogether, our study and precomputed embeddings will facilitate the development of machine-learning models in genetics and related fields.

13.
Am J Hum Genet ; 110(12): 2056-2067, 2023 Dec 07.
Article in English | MEDLINE | ID: mdl-38006880

ABSTRACT

Detection of aberrantly spliced genes is an important step in RNA-seq-based rare-disease diagnostics. We recently developed FRASER, a denoising autoencoder-based method that outperformed alternative methods of detecting aberrant splicing. However, because FRASER's three splice metrics are partially redundant and tend to be sensitive to sequencing depth, we introduce here a more robust intron-excision metric, the intron Jaccard index, that combines the alternative donor, alternative acceptor, and intron-retention signal into a single value. Moreover, we optimized model parameters and filter cutoffs by using candidate rare-splice-disrupting variants as independent evidence. On 16,213 GTEx samples, our improved algorithm, FRASER 2.0, called typically 10 times fewer splicing outliers while increasing the proportion of candidate rare-splice-disrupting variants by 10-fold and substantially decreasing the effect of sequencing depth on the number of reported outliers. To lower the multiple-testing correction burden, we introduce an option to select the genes to be tested for each sample instead of a transcriptome-wide approach. This option can be particularly useful when prior information, such as candidate variants or genes, is available. Application on 303 rare-disease samples confirmed the relative reduction in the number of outlier calls for a slight loss of sensitivity; FRASER 2.0 recovered 22 out of 26 previously identified pathogenic splicing cases with default cutoffs and 24 when multiple-testing correction was limited to OMIM genes containing rare variants. Altogether, these methodological improvements contribute to more effective RNA-seq-based rare diagnostics by drastically reducing the amount of splicing outlier calls per sample at minimal loss of sensitivity.


Subject(s)
Alternative Splicing , RNA Splicing , Humans , Alternative Splicing/genetics , Introns/genetics , RNA Splicing/genetics , RNA-Seq , Algorithms
14.
Genome Biol ; 24(1): 180, 2023 08 04.
Article in English | MEDLINE | ID: mdl-37542318

ABSTRACT

We present RBPNet, a novel deep learning method, which predicts CLIP-seq crosslink count distribution from RNA sequence at single-nucleotide resolution. By training on up to a million regions, RBPNet achieves high generalization on eCLIP, iCLIP and miCLIP assays, outperforming state-of-the-art classifiers. RBPNet performs bias correction by modeling the raw signal as a mixture of the protein-specific and background signal. Through model interrogation via Integrated Gradients, RBPNet identifies predictive sub-sequences that correspond to known and novel binding motifs and enables variant-impact scoring via in silico mutagenesis. Together, RBPNet improves imputation of protein-RNA interactions, as well as mechanistic interpretation of predictions.


Subject(s)
Base Sequence , Computer Simulation , Deep Learning , RNA-Binding Proteins , RNA , Humans , Alleles , Bias , Binding Sites , Consensus Sequence , Datasets as Topic , Internet , Mutation , Nucleotide Motifs , Nucleotides/metabolism , RNA/chemistry , RNA/genetics , RNA/metabolism , RNA Splice Sites , RNA, Messenger/chemistry , RNA, Messenger/genetics , RNA, Messenger/metabolism , RNA, Viral/chemistry , RNA, Viral/genetics , RNA, Viral/metabolism , RNA-Binding Proteins/chemistry , RNA-Binding Proteins/metabolism
15.
Nat Genet ; 55(5): 861-870, 2023 05.
Article in English | MEDLINE | ID: mdl-37142848

ABSTRACT

Aberrant splicing is a major cause of genetic disorders but its direct detection in transcriptomes is limited to clinically accessible tissues such as skin or body fluids. While DNA-based machine learning models can prioritize rare variants for affecting splicing, their performance in predicting tissue-specific aberrant splicing remains unassessed. Here we generated an aberrant splicing benchmark dataset, spanning over 8.8 million rare variants in 49 human tissues from the Genotype-Tissue Expression (GTEx) dataset. At 20% recall, state-of-the-art DNA-based models achieve maximum 12% precision. By mapping and quantifying tissue-specific splice site usage transcriptome-wide and modeling isoform competition, we increased precision by threefold at the same recall. Integrating RNA-sequencing data of clinically accessible tissues into our model, AbSplice, brought precision to 60%. These results, replicated in two independent cohorts, substantially contribute to noncoding loss-of-function variant identification and to genetic diagnostics design and analytics.


Subject(s)
Alternative Splicing , RNA Splicing , Humans , RNA Splicing/genetics , Alternative Splicing/genetics , Sequence Analysis, RNA/methods , Transcriptome , Protein Isoforms
16.
medRxiv ; 2023 May 11.
Article in English | MEDLINE | ID: mdl-37214898

ABSTRACT

Genome-wide association studies have unearthed a wealth of genetic associations across many complex diseases. However, translating these associations into biological mechanisms contributing to disease etiology and heterogeneity has been challenging. Here, we hypothesize that the effects of disease-associated genetic variants converge onto distinct cell type specific molecular pathways within distinct subgroups of patients. In order to test this hypothesis, we develop the CASTom-iGEx pipeline to operationalize individual level genotype data to interpret personal polygenic risk and identify the genetic basis of clinical heterogeneity. The paradigmatic application of this approach to coronary artery disease and schizophrenia reveals a convergence of disease associated variant effects onto known and novel genes, pathways, and biological processes. The biological process specific genetic liabilities are not equally distributed across patients. Instead, they defined genetically distinct groups of patients, characterized by different profiles across pathways, endophenotypes, and disease severity. These results provide further evidence for a genetic contribution to clinical heterogeneity and point to the existence of partially distinct pathomechanisms across patient subgroups. Thus, the universally applicable approach presented here has the potential to constitute an important component of future personalized medicine concepts.

17.
medRxiv ; 2023 Apr 03.
Article in English | MEDLINE | ID: mdl-37066374

ABSTRACT

Detection of aberrantly spliced genes is an important step in RNA-seq-based rare disease diagnostics. We recently developed FRASER, a denoising autoencoder-based method for aberrant splicing detection that outperformed alternative approaches. However, as FRASER's three splice metrics are partially redundant and tend to be sensitive to sequencing depth, we introduce here a more robust intron excision metric, the Intron Jaccard Index, that combines alternative donor, alternative acceptor, and intron retention signal into a single value. Moreover, we optimized model parameters and filter cutoffs using candidate rare splice-disrupting variants as independent evidence. On 16,213 GTEx samples, our improved algorithm called typically 10 times fewer splicing outliers while increasing the proportion of candidate rare splice-disrupting variants by 10 fold and substantially decreasing the effect of sequencing depth on the number of reported outliers. Application on 303 rare disease samples confirmed the reduction fold-change of the number of outlier calls for a slight loss of sensitivity (only 2 out of 22 previously identified pathogenic splicing cases not recovered). Altogether, these methodological improvements contribute to more effective RNA-seq-based rare diagnostics by a drastic reduction of the amount of splicing outlier calls per sample at minimal loss of sensitivity.

18.
Nat Biotechnol ; 41(12): 1787-1800, 2023 Dec.
Article in English | MEDLINE | ID: mdl-37012447

ABSTRACT

The epicardium, the mesothelial envelope of the vertebrate heart, is the source of multiple cardiac cell lineages during embryonic development and provides signals that are essential to myocardial growth and repair. Here we generate self-organizing human pluripotent stem cell-derived epicardioids that display retinoic acid-dependent morphological, molecular and functional patterning of the epicardium and myocardium typical of the left ventricular wall. By combining lineage tracing, single-cell transcriptomics and chromatin accessibility profiling, we describe the specification and differentiation process of different cell lineages in epicardioids and draw comparisons to human fetal development at the transcriptional and morphological levels. We then use epicardioids to investigate the functional cross-talk between cardiac cell types, gaining new insights into the role of IGF2/IGF1R and NRP2 signaling in human cardiogenesis. Finally, we show that epicardioids mimic the multicellular pathogenesis of congenital or stress-induced hypertrophy and fibrotic remodeling. As such, epicardioids offer a unique testing ground of epicardial activity in heart development, disease and regeneration.


Subject(s)
Heart , Pericardium , Humans , Pericardium/metabolism , Myocardium , Cell Differentiation/genetics , Cell Lineage/genetics , Biology
19.
Genome Biol ; 24(1): 56, 2023 03 27.
Article in English | MEDLINE | ID: mdl-36973806

ABSTRACT

BACKGROUND: The largest sequence-based models of transcription control to date are obtained by predicting genome-wide gene regulatory assays across the human genome. This setting is fundamentally correlative, as those models are exposed during training solely to the sequence variation between human genes that arose through evolution, questioning the extent to which those models capture genuine causal signals. RESULTS: Here we confront predictions of state-of-the-art models of transcription regulation against data from two large-scale observational studies and five deep perturbation assays. The most advanced of these sequence-based models, Enformer, by and large, captures causal determinants of human promoters. However, models fail to capture the causal effects of enhancers on expression, notably in medium to long distances and particularly for highly expressed promoters. More generally, the predicted impact of distal elements on gene expression predictions is small and the ability to correctly integrate long-range information is significantly more limited than the receptive fields of the models suggest. This is likely caused by the escalating class imbalance between actual and candidate regulatory elements as distance increases. CONCLUSIONS: Our results suggest that sequence-based models have advanced to the point that in silico study of promoter regions and promoter variants can provide meaningful insights and we provide practical guidance on how to use them. Moreover, we foresee that it will require significantly more and particularly new kinds of data to train models accurately accounting for distal elements.


Subject(s)
Enhancer Elements, Genetic , Genomics , Humans , Genomics/methods , Promoter Regions, Genetic , Gene Expression Regulation , Gene Expression
20.
Nucleic Acids Res ; 51(4): e21, 2023 02 28.
Article in English | MEDLINE | ID: mdl-36617985

ABSTRACT

Transposon screens are powerful in vivo assays used to identify loci driving carcinogenesis. These loci are identified as Common Insertion Sites (CISs), i.e. regions with more transposon insertions than expected by chance. However, the identification of CISs is affected by biases in the insertion behaviour of transposon systems. Here, we introduce Transmicron, a novel method that differs from previous methods by (i) modelling neutral insertion rates based on chromatin accessibility, transcriptional activity and sequence context and (ii) estimating oncogenic selection for each genomic region using Poisson regression to model insertion counts while controlling for neutral insertion rates. To assess the benefits of our approach, we generated a dataset applying two different transposon systems under comparable conditions. Benchmarking for enrichment of known cancer genes showed improved performance of Transmicron against state-of-the-art methods. Modelling neutral insertion rates allowed for better control of false positives and stronger agreement of the results between transposon systems. Moreover, using Poisson regression to consider intra-sample and inter-sample information proved beneficial in small and moderately-sized datasets. Transmicron is open-source and freely available. Overall, this study contributes to the understanding of transposon biology and introduces a novel approach to use this knowledge for discovering cancer driver genes.


Subject(s)
DNA Transposable Elements , Neoplasms , Software , Humans , Base Sequence , Carcinogenesis , Mutagenesis, Insertional , Oncogenes , Neoplasms/genetics
SELECTION OF CITATIONS
SEARCH DETAIL
...