RESUMO
Autopsy rates are declining globally, impacting cause-of-death (CoD) diagnoses and quality control. Postmortem metabolomics was evaluated for CoD screening using 4,282 human cases, encompassing CoD groups: acidosis, drug intoxication, hanging, ischemic heart disease (IHD), and pneumonia. Cases were split 3:1 into training and test sets. High-resolution mass spectrometry data from femoral blood were analyzed via orthogonal-partial least squares discriminant analysis (OPLS-DA) to discriminate CoD groups. OPLS-DA achieved an R2 = 0.52 and Q2 = 0.30, with true-positive prediction rates of 68% and 65% for training and test sets, respectively, across all groups. Specificity-optimized thresholds predicted 56% of test cases with a unique CoD, average 45% sensitivity, and average 96% specificity. Prediction accuracies varied: 98.7% for acidosis, 80.5% for drug intoxication, 81.6% for hanging, 73.1% for IHD, and 93.6% for pneumonia. This study demonstrates the potential of large-scale postmortem metabolomics for CoD screening, offering high specificity and enhancing throughput and decision-making in human death investigations.
RESUMO
We introduce an all-optical technique that enables volumetric imaging of brain-wide calcium activity and targeted optogenetic stimulation of specific brain regions in unrestrained larval zebrafish. The system consists of three main components: a 3D tracking module, a dual-color fluorescence imaging module, and a real-time activity manipulation module. Our approach uses a sensitive genetically encoded calcium indicator in combination with a long Stokes shift red fluorescence protein as a reference channel, allowing the extraction of Ca2+ activity from signals contaminated by motion artifacts. The method also incorporates rapid 3D image reconstruction and registration, facilitating real-time selective optogenetic stimulation of different regions of the brain. By demonstrating that selective light activation of the midbrain regions in larval zebrafish could reliably trigger biased turning behavior and changes of brain-wide neural activity, we present a valuable tool for investigating the causal relationship between distributed neural circuit dynamics and naturalistic behavior.
RESUMO
Microinjecting yeast cells has been challenging for decades with no significant breakthrough due to the ultra-tough cell wall and low stiffness of the traditional injector tip at the micro-scale. Penetrating this protection wall is the key step for artificially bringing foreign substance into the yeast. In this paper, a yeast cell model was built by using finite element analysis (FEA) method to analyze the penetrating process. The key parameters of the yeast cell wall in the model (the Young's modulus, the shear modulus, and the Lame constant) were calibrated according to a general nanoindentation experiment. Then by employing the calibrated model, the injection parameters were optimized to minimize the cell damage (the maximum cell deformation at the critical stress of the cell wall). Key guidelines were suggested for penetrating the cell wall during microinjection.
RESUMO
Interactions within the tumor microenvironment (TME) significantly influence tumor progression and treatment responses. While single-cell RNA sequencing (scRNA-seq) and spatial genomics facilitate TME exploration, many clinical cohorts are assessed at the bulk tissue level. Integrating scRNA-seq and bulk tissue RNA-seq data through computational deconvolution is essential for obtaining clinically relevant insights. Our method, ProM, enables the examination of major and minor cell types. Through evaluation against existing methods using paired single-cell and bulk RNA sequencing of human urothelial cancer (UC) samples, ProM demonstrates superiority. Application to UC cohorts treated with immune checkpoint inhibitors reveals pre-treatment cellular features associated with poor outcomes, such as elevated SPP1 expression in macrophage/monocytes (MM). Our deconvolution method and paired single-cell and bulk tissue RNA-seq dataset contribute novel insights into TME heterogeneity and resistance to immune checkpoint blockade.
RESUMO
Accurate detection of pathogens, particularly distinguishing between Gram-positive and Gram-negative bacteria, could improve disease treatment. Host gene expression can capture the immune system's response to infections caused by various pathogens. Here, we present a deep neural network model, bvnGPS2, which incorporates the attention mechanism based on a large-scale integrated host transcriptome dataset to precisely identify Gram-positive and Gram-negative bacterial infections as well as viral infections. We performed analysis of 4,949 blood samples across 40 cohorts from 10 countries using our previously designed omics data integration method, iPAGE, to select discriminant gene pairs and train the bvnGPS2. The performance of the model was evaluated on six independent cohorts comprising 374 samples. Overall, our deep neural network model shows robust capability to accurately identify specific infections, paving the way for precise medicine strategies in infection treatment and potentially also for identifying subtypes of other diseases.
RESUMO
Evaluation of the binding affinities of drugs to proteins is a crucial process for identifying drug pharmacological actions, but it requires three dimensional structures of proteins. Herein, we propose novel computational methods to predict the therapeutic indications and side effects of drug candidate compounds from the binding affinities to human protein structures on a proteome-wide scale. Large-scale docking simulations were performed for 7,582 drugs with 19,135 protein structures revealed by AlphaFold (including experimentally unresolved proteins), and machine learning models on the proteome-wide binding affinity score (PBAS) profiles were constructed. We demonstrated the usefulness of the method for predicting the therapeutic indications for 559 diseases and side effects for 285 toxicities. The method enabled to predict drug indications for which the related protein structures had not been experimentally determined and to successfully extract proteins eliciting the side effects. The proposed method will be useful in various applications in drug discovery.
RESUMO
To understand the decision process of genomic sequence-to-function models, explainable AI algorithms determine the importance of each nucleotide in a given input sequence to the model's predictions and enable discovery of cis-regulatory motifs for gene regulation. The most commonly applied method is in silico saturation mutagenesis (ISM) because its per-nucleotide importance scores can be intuitively understood as the computational counterpart to in vivo saturation mutagenesis experiments. While ISM is highly interpretable, it is computationally challenging to perform for many sequences, and becomes prohibitive as the length of the input sequences and size of the model grows. Here, we use the first-order Taylor approximation to approximate ISM values from the model's gradient, which reduces its computation cost to a single forward pass for an input sequence. We show that the Taylor ISM (TISM) approximation is robust across different model ablations, random initializations, training parameters, and dataset sizes.
RESUMO
Cytotoxic T lymphocyte (CTL) and terminal exhausted T lymphocyte (ETL) activities crucially influence immune checkpoint inhibitor (ICI) response. Despite this, the efficacy of ETL and CTL transcriptomic signatures for response prediction remains limited. Investigating this across the TCGA and publicly available single-cell cohorts, we find a strong positive correlation between ETL and CTL expression signatures in most cancers. We hence posited that their limited predictability arises due to their mutually canceling effects on ICI response. Thus, we developed DETACH, a computational method to identify a gene set whose expression pinpoints to a subset of melanoma patients where the CTL and ETL correlation is low. DETACH enhances CTL's prediction accuracy, outperforming existing signatures. DETACH signature genes activity also demonstrates a positive correlation with lymphocyte infiltration and the prevalence of reactive T cells in the tumor microenvironment (TME), advancing our understanding of the CTL cell state within the TME.
RESUMO
Pangenomics alignment offers a solution to reduce bias in biomedical research. Traditionally, short-read aligners like Bowtie and BWA indexed a single reference genome to find approximate alignments. These methods, limited by linear-memory requirements, can only index a few genomes. Emerging pangenome aligners, such as VG, Giraffe, and Moni, address this by indexing more genomes. VG and Giraffe use a variation graph, while Moni indexes sequences accounting for repetition using prefix-free parsing to build a dictionary and parse. The main challenge is the parse's size, which becomes significantly larger than the dictionary. To scale Moni, we propose removing the parse from the construction of the run-length encoded BWT (RLBWT), suffix array, and Longest Common Prefix (LCP) by applying prefix-free parsing recursively. This approach improves construction time and memory requirements, enabling efficient construction of RLBWT, suffix array, and LCP for large pangenomes, such as those from the Human Pangenome Reference Consortium.
RESUMO
Genome assembly databases are growing rapidly. The redundancy of sequence content between a new assembly and previous ones is neither conceptually nor algorithmically easy to measure. We introduce pertinent methods and DandD, a tool addressing how much new sequence is gained when a sequence collection grows. DandD can describe how much structural variation is discovered in each new human genome assembly and when discoveries will level off in the future. DandD uses a measure called δ ("delta"), developed initially for data compression and chiefly dependent on k-mer counts. DandD rapidly estimates δ using genomic sketches. We propose δ as an alternative to k-mer-specific cardinalities when computing the Jaccard coefficient, thereby avoiding the pitfalls of a poor choice of k. We demonstrate the utility of DandD's functions for estimating δ, characterizing the rate of pangenome growth, and computing all-pairs similarities using k-independent Jaccard.
RESUMO
Compound-protein interaction (CPI) affinity prediction plays an important role in reducing the cost and time of drug discovery. However, the interpretability of how fragments function in CPI is impacted by the fact that current methods ignore the affinity relationships between fragments of compounds and fragments of proteins in CPI modeling. This article introduces an improved Transformer called FOTF-CPI (a Fusion of Optimal Transport Fragments compound-protein interaction prediction model). We use an optimal transport-based fragmentation approach to improve the model's understanding of compound and protein sequences. Additionally, a fused attention mechanism is employed, which combines the features of fragments to capture full affinity information. This fused attention redistributes higher attention scores to fragments with higher affinity. Experimental results show FOTF-CPI achieves an average 2% higher performance than other models on all three datasets. Furthermore, the visualization confirms the potential of FOTF-CPI for drug discovery applications.
RESUMO
microRNAs (miRNAs) are small regulatory RNAs that repress target mRNA transcripts through base pairing. Although the mechanisms of miRNA production and function are clearly established, new insights into miRNA regulation or miRNA-mediated gene silencing are still emerging. In order to facilitate the discovery of miRNA regulators or effectors, we have developed sRNA-Effector, a machine learning algorithm trained on enhanced crosslinking and immunoprecipitation sequencing and RNA sequencing data following knockdown of specific genes. sRNA-Effector can accurately identify known miRNA biogenesis and effector proteins and identifies 9 putative regulators of miRNA function, including serine/threonine kinase STK33, splicing factor SFPQ, and proto-oncogene BMI1. We validated the role of STK33, SFPQ, and BMI1 in miRNA regulation, showing that sRNA-Effector is useful for identifying new players in small RNA biology. sRNA-Effector will be a web tool available for all researchers to identify potential miRNA regulators in any cell line of interest.
RESUMO
Identifying cancer genes is vital for cancer diagnosis and treatment. However, because of the complexity of cancer occurrence and limited cancer genes knowledge, it is hard to identify cancer genes accurately using only a few omics data, and the overall performance of existing methods is being called for further improvement. Here, we introduce a two-stage gradual-learning strategy GLIMS to predict cancer genes using integrative features from multi-omics data. Firstly, it uses a semi-supervised hierarchical graph neural network to predict the initial candidate cancer genes by integrating multi-omics data and protein-protein interaction (PPI) network. Then, it uses an unsupervised approach to further optimize the initial prediction by integrating the co-splicing network in post-transcriptional regulation, which plays an important role in cancer development. Systematic experiments on multi-omics cancer data demonstrated that GLIMS outperforms the state-of-the-art methods for the identification of cancer genes and it could be a useful tool to help advance cancer analysis.
RESUMO
Notch-Delta-Jagged (NDJ) signaling among neighboring cells contributes crucially to spatiotemporal pattern formation and developmental decision-making. Despite numerous detailed mathematical models, their high-dimensionality parametric space limits analytical treatment, especially regarding local microenvironmental fluctuations. Using the low-dimensional dynamics of the recently postulated least microenvironmental uncertainty principle (LEUP) framework, we showcase how the LEUP formalism recapitulates a noisy NDJ spatial patterning. Our LEUP simulations show that local phenotypic entropy increases for lateral inhibition but decreases for lateral induction. This distinction allows us to identify a critical parameter that captures the transition from a Notch-Delta-driven lateral inhibition to a Notch-Jagged-driven lateral induction phenomenon and suggests random phenotypic patterning in the case of lack of dominance of either Notch-Delta or Notch-Jagged signaling. Our results enable an analytical treatment to map the high-dimensional dynamics of NDJ signaling on tissue-level patterning and can possibly be generalized to decode operating principles of collective cellular decision-making.
RESUMO
Amyotrophic lateral sclerosis (ALS) is a universally fatal neurodegenerative disease with no cure. Human endogenous retroviruses (HERVs) have been implicated in its pathogenesis but their relevance to ALS is not fully understood. We examined bulk RNA-seq data from almost 2,000 ALS and unaffected control samples derived from the cortex and spinal cord. Using different methods of feature selection, including differential expression analysis and machine learning, we discovered that transcription of HERV-K loci 1q22 and 8p23.1 were significantly upregulated in the spinal cord of individuals with ALS. Additionally, we identified a subset of ALS patients with upregulated HERV-K expression in the cortex and spinal cord. We also found the expression of HERV-K loci 19q11 and 8p23.1 was correlated with protein coding genes previously implicated in ALS and dysregulated in ALS patients in this study. These results clarify the association of HERV-K and ALS and highlight specific genes in the pathobiology of late-stage ALS.
RESUMO
Ab initio computational reconstructions of protein-protein interaction (PPI) networks will provide invaluable insights into cellular systems, enabling the discovery of novel molecular interactions and elucidating biological mechanisms within and between organisms. Leveraging the latest generation protein language models and recurrent neural networks, we present SENSE-PPI, a sequence-based deep learning model that efficiently reconstructs ab initio PPIs, distinguishing partners among tens of thousands of proteins and identifying specific interactions within functionally similar proteins. SENSE-PPI demonstrates high accuracy, limited training requirements, and versatility in cross-species predictions, even with non-model organisms and human-virus interactions. Its performance decreases for phylogenetically more distant model and non-model organisms, but signal alteration is very slow. In this regard, it demonstrates the important role of parameters in protein language models. SENSE-PPI is very fast and can test 10,000 proteins against themselves in a matter of hours, enabling the reconstruction of genome-wide proteomes.
RESUMO
Coronary artery disease (CAD) remains a leading cause of disease burden globally, and there is a persistent need for new therapeutic targets. Instrumental variable (IV) and genetic colocalization analyses can help identify novel therapeutic targets for human disease by nominating causal genes in genome-wide association study (GWAS) loci. We conducted cis-IV analyses for 20,125 genes and 1,746 plasma proteins with CAD using molecular trait quantitative trait loci variant (QTLs) data from three different studies. 19 proteins and 119 genes were significantly associated with CAD risk by IV analyses and demonstrated evidence of genetic colocalization. Notably, our analyses validated well-established targets such as PCSK9 and ANGPTL4 while also identifying HTRA1 and endotrophin (a cleavage product of COL6A3) as proteins whose levels are causally associated with CAD risk. Further experimental studies are needed to confirm the causal role of the genes and proteins identified through our multiomic cis-IV analyses on human disease.
RESUMO
Traditional loss functions such as cross-entropy loss often quantify the penalty for each mis-classified training sample without adequately considering its distance from the ground truth class distribution in the feature space. Intuitively, the larger this distance is, the higher the penalty should be. With this observation, we propose a penalty called distance-weighted Sinkhorn (DWS) loss. For each mis-classified training sample (with predicted label A and true label B), its contribution to the DWS loss positively correlates to the distance the training sample needs to travel to reach the ground truth distribution of all the A samples. We apply the DWS framework with a neural network to classify different stages of Alzheimer's disease. Our empirical results demonstrate that the DWS framework outperforms the traditional neural network loss functions and is comparable or better to traditional machine learning methods, highlighting its potential in biomedical informatics and data science.
RESUMO
Dysregulation of normal transcription factor activity is a common driver of disease. Therefore, the detection of aberrant transcription factor activity is important to understand disease pathogenesis. We have developed Priori, a method to predict transcription factor activity from RNA sequencing data. Priori has two key advantages over existing methods. First, Priori utilizes literature-supported regulatory information to identify transcription factor-target gene relationships. It then applies linear models to determine the impact of transcription factor regulation on the expression of its target genes. Second, results from a third-party benchmarking pipeline reveals that Priori detects aberrant activity from 124 single-gene perturbation experiments with higher sensitivity and specificity than 11 other methods. We applied Priori and other top-performing methods to predict transcription factor activity from two large primary patient datasets. Our work demonstrates that Priori uniquely discovered significant determinants of survival in breast cancer and identified mediators of drug response in leukemia.
RESUMO
GWAS focuses on significance loosing false positives; machine learning probes sub-significant features relying on predictivity. Yet, these are far from orthogonal. We sought to explore how these inform each other in sub-genome-wide significant situations to define relevance for predictive features. We introduce the SVM-based RubricOE that selects heavily cross-validated feature sets, and LDpred2 PRS as a strong contrast to SVM, to explore significance and predictivity. Our Alzheimer's test case notoriously lacks strong genetic signals except for few very strong phenotype-SNP associations, which suits the problem we are exploring. We found that the most significant SNPs among ML and PRS-selected SNPs captured most of the predictivity, while weaker associations tend also to contribute weakly to predictivity. SNPs with weak associations tend not to contribute to predictivity, but deletion of these features does not injure it. Significance provides a ranking that helps identify weakly predictive features.