Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 20 de 320.458
Filter
1.
Life Sci Alliance ; 7(10)2024 Oct.
Article in English | MEDLINE | ID: mdl-39107066

ABSTRACT

Probabilistic topic modelling has become essential in many types of single-cell data analysis. Based on probabilistic topic assignments in each cell, we identify the latent representation of cellular states. A dictionary matrix, consisting of topic-specific gene frequency vectors, provides interpretable bases to be compared with known cell type-specific marker genes and other pathway annotations. However, fitting a topic model on a large number of cells would require heavy computational resources-specialized computing units, computing time and memory. Here, we present a scalable approximation method customized for single-cell RNA-seq data analysis, termed ASAP, short for Annotating a Single-cell data matrix by Approximate Pseudobulk estimation. Our approach is more accurate than existing methods but requires orders of magnitude less computing time, leaving much lower memory consumption. We also show that our approach is widely applicable for atlas-scale data analysis; our method seamlessly integrates single-cell and bulk data in joint analysis, not requiring additional preprocessing or feature selection steps.


Subject(s)
Computational Biology , Single-Cell Analysis , Single-Cell Analysis/methods , Humans , Computational Biology/methods , Algorithms , Models, Statistical , Sequence Analysis, RNA/methods , RNA-Seq/methods , Gene Expression Profiling/methods
2.
Nat Commun ; 15(1): 6699, 2024 Aug 07.
Article in English | MEDLINE | ID: mdl-39107330

ABSTRACT

Post-translational modifications (PTMs) are pivotal in modulating protein functions and influencing cellular processes like signaling, localization, and degradation. The complexity of these biological interactions necessitates efficient predictive methodologies. In this work, we introduce PTMGPT2, an interpretable protein language model that utilizes prompt-based fine-tuning to improve its accuracy in precisely predicting PTMs. Drawing inspiration from recent advancements in GPT-based architectures, PTMGPT2 adopts unsupervised learning to identify PTMs. It utilizes a custom prompt to guide the model through the subtle linguistic patterns encoded in amino acid sequences, generating tokens indicative of PTM sites. To provide interpretability, we visualize attention profiles from the model's final decoder layer to elucidate sequence motifs essential for molecular recognition and analyze the effects of mutations at or near PTM sites to offer deeper insights into protein functionality. Comparative assessments reveal that PTMGPT2 outperforms existing methods across 19 PTM types, underscoring its potential in identifying disease associations and drug targets.


Subject(s)
Protein Processing, Post-Translational , Proteins/metabolism , Proteins/chemistry , Proteins/genetics , Amino Acid Sequence , Humans , Computational Biology/methods , Algorithms , Databases, Protein
3.
BMC Med Imaging ; 24(1): 204, 2024 Aug 06.
Article in English | MEDLINE | ID: mdl-39107679

ABSTRACT

BACKGROUND: Computed tomography (CT) is widely in clinics and is affected by metal implants. Metal segmentation is crucial for metal artifact correction, and the common threshold method often fails to accurately segment metals. PURPOSE: This study aims to segment metal implants in CT images using a diffusion model and further validate it with clinical artifact images and phantom images of known size. METHODS: A retrospective study was conducted on 100 patients who received radiation therapy without metal artifacts, and simulated artifact data were generated using publicly available mask data. The study utilized 11,280 slices for training and verification, and 2,820 slices for testing. Metal mask segmentation was performed using DiffSeg, a diffusion model incorporating conditional dynamic coding and a global frequency parser (GFParser). Conditional dynamic coding fuses the current segmentation mask and prior images at multiple scales, while GFParser helps eliminate high-frequency noise in the mask. Clinical artifact images and phantom images are also used for model validation. RESULTS: Compared with the ground truth, the accuracy of DiffSeg for metal segmentation of simulated data was 97.89% and that of DSC was 95.45%. The mask shape obtained by threshold segmentation covered the ground truth and DSCs were 82.92% and 84.19% for threshold segmentation based on 2500 HU and 3000 HU. Evaluation metrics and visualization results show that DiffSeg performs better than other classical deep learning networks, especially for clinical CT, artifact data, and phantom data. CONCLUSION: DiffSeg efficiently and robustly segments metal masks in artifact data with conditional dynamic coding and GFParser. Future work will involve embedding the metal segmentation model in metal artifact reduction to improve the reduction effect.


Subject(s)
Artifacts , Metals , Phantoms, Imaging , Prostheses and Implants , Tomography, X-Ray Computed , Humans , Tomography, X-Ray Computed/methods , Retrospective Studies , Algorithms
4.
BMC Med Res Methodol ; 24(1): 172, 2024 Aug 06.
Article in English | MEDLINE | ID: mdl-39107693

ABSTRACT

We have introduced the R package jmBIG to facilitate the analysis of large healthcare datasets and the development of predictive models. This package provides a comprehensive set of tools and functions specifically designed for the joint modelling of longitudinal and survival data in the context of big data analytics. The jmBIG package offers efficient and scalable implementations of joint modelling algorithms, allowing for integrating large-scale healthcare datasets.By utilizing the capabilities of jmBIG, researchers and analysts can effectively handle the challenges associated with big healthcare data, such as high dimensionality and complex relationships between multiple outcomes.With the support of jmBIG, analysts can seamlessly fit Bayesian joint models, generate predictions, and evaluate the performance of the models. The package incorporates cutting-edge methodologies and harnesses the computational capabilities of parallel computing to accelerate the analysis of large-scale healthcare datasets significantly. In summary, jmBIG empowers researchers to gain deeper insights into disease progression and treatment response, fostering evidence-based decision-making and paving the way for personalized healthcare interventions that can positively impact patient outcomes on a larger scale.


Subject(s)
Algorithms , Bayes Theorem , Big Data , Precision Medicine , Humans , Precision Medicine/methods , Precision Medicine/statistics & numerical data , Longitudinal Studies , Survival Analysis , Risk Assessment/methods , Risk Assessment/statistics & numerical data , Models, Statistical , Software
5.
BMC Med Res Methodol ; 24(1): 171, 2024 Aug 06.
Article in English | MEDLINE | ID: mdl-39107695

ABSTRACT

BACKGROUND: Dimension reduction methods do not always reduce their underlying indicators to a single composite score. Furthermore, such methods are usually based on optimality criteria that require discarding some information. We suggest, under some conditions, to use the joint probability density function (joint pdf or JPD) of p-dimensional random variable (the p indicators), as an index or a composite score. It is proved that this index is more informative than any alternative composite score. In two examples, we compare the JPD index with some alternatives constructed from traditional methods. METHODS: We develop a probabilistic unsupervised dimension reduction method based on the probability density of multivariate data. We show that the conditional distribution of the variables given JPD is uniform, implying that the JPD is the most informative scalar summary under the most common notions of information. B. We show under some widely plausible conditions, JPD can be used as an index. To use JPD as an index, in addition to having a plausible interpretation, all the random variables should have approximately the same direction(unidirectionality) as the density values (codirectionality). We applied these ideas to two data sets: first, on the 7 Brief Pain Inventory Interference scale (BPI-I) items obtained from 8,889 US Veterans with chronic pain and, second, on a novel measure based on administrative data for 912 US Veterans. To estimate the JPD in both examples, among the available JPD estimation methods, we used its conditional specifications, identified a well-fitted parametric model for each factored conditional (regression) specification, and, by maximizing the corresponding likelihoods, estimated their parameters. Due to the non-uniqueness of conditional specification, the average of all estimated conditional specifications was used as the final estimate. Since a prevalent common use of indices is ranking, we used measures of monotone dependence [e.g., Spearman's rank correlation (rho)] to assess the strength of unidirectionality and co-directionality. Finally, we cross-validate the JPD score against variance-covariance-based scores (factor scores in unidimensional models), and the "person's parameter" estimates of (Generalized) Partial Credit and Graded Response IRT models. We used Pearson Divergence as a measure of information and Shannon's entropy to compare uncertainties (informativeness) in these alternative scores. RESULTS: An unsupervised dimension reduction was developed based on the joint probability density (JPD) of the multi-dimensional data. The JPD, under regularity conditions, may be used as an index. For the well-established Brief Pain Interference Inventory (BPI-I (the short form with 7 Items) and for a new mental health severity index (MoPSI) with 6 indicators, we estimated the JPD scoring. We compared, assuming unidimensionality, factor scores, Person's scores of the Partial Credit model, the Generalized Partial Credit model, and the Graded Response model with JPD scoring. As expected, all scores' rankings in both examples were monotonically dependent with various strengths. Shannon entropy was the smallest for JPD scores. Pearson Divergence of the estimated densities of different indices against uniform distribution was maximum for JPD scoring. CONCLUSIONS: An unsupervised probabilistic dimension reduction is possible. When appropriate, the joint probability density function can be used as the most informative index. Model specification and estimation and steps to implement the scoring were demonstrated. As expected, when the required assumption in factor analysis and IRT models are satisfied, JPD scoring agrees with these established scores. However, when these assumptions are violated, JPD scores preserve all the information in the indicators with minimal assumption.


Subject(s)
Probability , Humans , Pain/diagnosis , Severity of Illness Index , Pain Measurement/methods , Pain Measurement/statistics & numerical data , Mental Disorders/diagnosis , Models, Statistical , Algorithms
6.
Nat Commun ; 15(1): 6611, 2024 Aug 04.
Article in English | MEDLINE | ID: mdl-39098889

ABSTRACT

Identifying cellular identities is a key use case in single-cell transcriptomics. While machine learning has been leveraged to automate cell annotation predictions for some time, there has been little progress in scaling neural networks to large data sets and in constructing models that generalize well across diverse tissues. Here, we propose scTab, an automated cell type prediction model specific to tabular data, and train it using a novel data augmentation scheme across a large corpus of single-cell RNA-seq observations (22.2 million cells). In this context, we show that cross-tissue annotation requires nonlinear models and that the performance of scTab scales both in terms of training dataset size and model size. Additionally, we show that the proposed data augmentation schema improves model generalization. In summary, we introduce a de novo cell type prediction model for single-cell RNA-seq data that can be trained across a large-scale collection of curated datasets and demonstrate the benefits of using deep learning methods in this paradigm.


Subject(s)
Single-Cell Analysis , Single-Cell Analysis/methods , Humans , RNA-Seq/methods , Machine Learning , Neural Networks, Computer , Transcriptome , Sequence Analysis, RNA/methods , Animals , Computational Biology/methods , Deep Learning , Gene Expression Profiling/methods , Algorithms
7.
BMC Bioinformatics ; 25(1): 256, 2024 Aug 04.
Article in English | MEDLINE | ID: mdl-39098908

ABSTRACT

BACKGROUND: Antioxidant proteins are involved in several biological processes and can protect DNA and cells from the damage of free radicals. These proteins regulate the body's oxidative stress and perform a significant role in many antioxidant-based drugs. The current invitro-based medications are costly, time-consuming, and unable to efficiently screen and identify the targeted motif of antioxidant proteins. METHODS: In this model, we proposed an accurate prediction method to discriminate antioxidant proteins namely StackedEnC-AOP. The training sequences are formulation encoded via incorporating a discrete wavelet transform (DWT) into the evolutionary matrix to decompose the PSSM-based images via two levels of DWT to form a Pseudo position-specific scoring matrix (PsePSSM-DWT) based embedded vector. Additionally, the Evolutionary difference formula and composite physiochemical properties methods are also employed to collect the structural and sequential descriptors. Then the combined vector of sequential features, evolutionary descriptors, and physiochemical properties is produced to cover the flaws of individual encoding schemes. To reduce the computational cost of the combined features vector, the optimal features are chosen using Minimum redundancy and maximum relevance (mRMR). The optimal feature vector is trained using a stacking-based ensemble meta-model. RESULTS: Our developed StackedEnC-AOP method reported a prediction accuracy of 98.40% and an AUC of 0.99 via training sequences. To evaluate model validation, the StackedEnC-AOP training model using an independent set achieved an accuracy of 96.92% and an AUC of 0.98. CONCLUSION: Our proposed StackedEnC-AOP strategy performed significantly better than current computational models with a ~ 5% and ~ 3% improved accuracy via training and independent sets, respectively. The efficacy and consistency of our proposed StackedEnC-AOP make it a valuable tool for data scientists and can execute a key role in research academia and drug design.


Subject(s)
Antioxidants , Proteins , Antioxidants/chemistry , Proteins/chemistry , Proteins/metabolism , Computational Biology/methods , Machine Learning , Algorithms , Wavelet Analysis , Support Vector Machine , Databases, Protein , Position-Specific Scoring Matrices
8.
PLoS One ; 19(8): e0307559, 2024.
Article in English | MEDLINE | ID: mdl-39137201

ABSTRACT

This study aims to develop a nonparametric mixed exponentially weighted moving average-moving average (NPEWMA-MA) sign control chart for monitoring shifts in process location, particularly when the distribution of a critical quality characteristic is either unknown or non-normal. In literature, the variance expression of the mixed exponentially weighted moving average-moving average (EWMA-MA) statistic is calculated by allowing sequential moving averages to be independent, and thus the exclusion of covariance terms results in an inaccurate variance expression. Furthermore, the effectiveness of the EWMA-MA control chart deteriorates when the distribution of a critical quality characteristic deviates from normality. The proposed NPEWMA-MA sign control chart addresses these by utilizing the corrected variance of the EWMA-MA statistic and incorporating the nonparametric sign test into the EWMA-MA charting structure. The chart integrates the moving average (MA) statistic into the exponentially weighted moving average (EWMA) statistic. The EWMA-MA charting statistic assigns more weight to recent w samples, with weights for previous observations decling exponentially. Monte Carlo simulations assess the chart's performance using various run length (RL) characteristics such as average run length (ARL), standard deviation of run length (SDRL), and median run length (MRL). Additional measures for overall performance include the average extra quadratic loss (AEQL) and relative mean index (RMI). The proposed NPEWMA-MA sign control chart demonstrates superior performance compared to existing nonparametric control charts across different symmetrical and asymmetric distributions. It efficiently detects process shifts, as validated through both a simulated study and a real-life example from a combined cycle power plant.


Subject(s)
Monte Carlo Method , Gases , Models, Statistical , Statistics, Nonparametric , Computer Simulation , Algorithms
9.
PLoS One ; 19(8): e0308529, 2024.
Article in English | MEDLINE | ID: mdl-39137223

ABSTRACT

This study delves into special class of generalized p-trigonometric functions, examining their connection to the established counterparts like p-cosine and p-sine functions. We here explore the new class of functions called the p-versine, p-coversine, p-haversine, and p-hacovercosine, by providing comprehensive definitions and properties. Grounded in the characteristics of p-cosine and p-sine functions, the newly proposed functions offer unique mathematical insights. Our work contributes towards a thorough understanding of these new special functions, showcasing their potential applications in diverse scientific domains, from mathematical analysis to physics and engineering. This paper contributes as a valuable resource for future applied mathematics researchers, engaging with these new mathematical functions, enhancing the ability to model complex patterns from diverse real-world applications.


Subject(s)
Models, Theoretical , Mathematics , Algorithms
10.
PLoS One ; 19(8): e0304655, 2024.
Article in English | MEDLINE | ID: mdl-39137226

ABSTRACT

Recognising human activities using smart devices has led to countless inventions in various domains like healthcare, security, sports, etc. Sensor-based human activity recognition (HAR), especially smartphone-based HAR, has become popular among the research community due to lightweight computation and user privacy protection. Deep learning models are the most preferred solutions in developing smartphone-based HAR as they can automatically capture salient and distinctive features from input signals and classify them into respective activity classes. However, in most cases, the architecture of these models needs to be deep and complex for better classification performance. Furthermore, training these models requires extensive computational resources. Hence, this research proposes a hybrid lightweight model that integrates an enhanced Temporal Convolutional Network (TCN) with Gated Recurrent Unit (GRU) layers for salient spatiotemporal feature extraction without tedious manual feature extraction. Essentially, dilations are incorporated into each convolutional kernel in the TCN-GRU model to extend the kernel's field of view without imposing additional model parameters. Moreover, fewer short filters are applied for each convolutional layer to alleviate excess parameters. Despite reducing computational cost, the proposed model utilises dilations, residual connections, and GRU layers for longer-term time dependency modelling by retaining longer implicit features of the input inertial sequences throughout training to provide sufficient information for future prediction. The performance of the TCN-GRU model is verified on two benchmark smartphone-based HAR databases, i.e., UCI HAR and UniMiB SHAR. The model attains promising accuracy in recognising human activities with 97.25% on UCI HAR and 93.51% on UniMiB SHAR. Since the current study exclusively works on the inertial signals captured by smartphones, future studies will explore the generalisation of the proposed TCN-GRU across diverse datasets, including various sensor types, to ensure its adaptability across different applications.


Subject(s)
Human Activities , Smartphone , Humans , Neural Networks, Computer , Deep Learning , Algorithms
11.
Biomed Phys Eng Express ; 10(5)2024 Aug 13.
Article in English | MEDLINE | ID: mdl-39094605

ABSTRACT

Aim. This study aimed to investigate the correlation between seismocardiographic and echocardiographic systolic variables and whether a decrease in preload could be detected by the seismocardiography (SCG).Methods. This study included a total of 34 subjects. SCG and electrocardiography were recorded simultaneously followed by echocardiography (echo) in both supine and 30◦head-up tilted position. The SCG signals was segmented into individual heartbeats and systolic fiducial points were defined using a detection algorithm. Statistical analysis included correlation coefficient calculations and paired sample tests.Results. SCG was able to measure a decrease in preload by almost all of the examined systolic SCG variables. It was possible to correlate certain echo variables to SCG time intervals, amplitudes, and peak to peak intervals. Also, changes between supineand tilted position of some SCG variables were possible to correlate to changes in echo variables. LVET, IVCT, S', strain, SR, SV, and LVEF were significantly correlated to relevant SCG variables.Conclusion. This study showed a moderate correlation, between systolic echo and systolic SCG variables. Additionally, systolic SCG variables were able to detect a decrease in preload.


Subject(s)
Algorithms , Echocardiography , Electrocardiography , Systole , Humans , Echocardiography/methods , Systole/physiology , Male , Female , Adult , Electrocardiography/methods , Heart Rate/physiology , Middle Aged , Young Adult , Heart/diagnostic imaging , Heart/physiology
12.
Biom J ; 66(6): e202300185, 2024 Sep.
Article in English | MEDLINE | ID: mdl-39101657

ABSTRACT

There has been growing research interest in developing methodology to evaluate the health care providers' performance with respect to a patient outcome. Random and fixed effects models are traditionally used for such a purpose. We propose a new method, using a fusion penalty to cluster health care providers based on quasi-likelihood. Without any priori knowledge of grouping information, our method provides a desirable data-driven approach for automatically clustering health care providers into different groups based on their performance. Further, the quasi-likelihood is more flexible and robust than the regular likelihood in that no distributional assumption is needed. An efficient alternating direction method of multipliers algorithm is developed to implement the proposed method. We show that the proposed method enjoys the oracle properties; namely, it performs as well as if the true group structure were known in advance. The consistency and asymptotic normality of the estimators are established. Simulation studies and analysis of the national kidney transplant registry data demonstrate the utility and validity of our method.


Subject(s)
Biometry , Health Personnel , Cluster Analysis , Likelihood Functions , Humans , Health Personnel/statistics & numerical data , Biometry/methods , Kidney Transplantation , Algorithms
13.
Article in English | MEDLINE | ID: mdl-39102321

ABSTRACT

Visual feedback gain is a crucial factor influencing the performance of precision grasping tasks, involving multiple brain regions of the visual motor system during task execution. However, the dynamic changes in brain network during this process remain unclear. The aim of this study is to investigate the impact of changes in visual feedback gain during precision grasping on brain network dynamics. Sixteen participants performed precision grip tasks at 15% of MVC under low (0.1°), medium (1°), and high (3°) visual feedback gain conditions, with simultaneous recording of EEG and right-hand precision grip data during the tasks. Utilizing electroencephalogram (EEG) microstate analysis, multiple parameters (Duration, Occurrence, Coverage, Transition probability(TP)) were extracted to assess changes in brain network dynamics. Precision grip accuracy and stability were evaluated using root mean square error(RMSE) and coefficient of variation(CV) of grip force. Compared to low visual feedback gain, under medium/high gain, the Duration, Occurrence, and Coverage of microstates B and D increase, while those of microstates A and C decrease. The Transition probability from microstates A, C, and D to B all increase. Additionally, RMSE and CV of grip force decrease. Occurrence and Coverage of microstates B and C are negatively correlated with RMSE and CV. These findings suggest that visual feedback gain affects the brain network dynamics during precision grasping; moderate increase in visual feedback gain can enhance the accuracy and stability of grip force, whereby the increased Occurrence and Coverage of microstates B and C contribute to improved performance in precision grasping. Our results play a crucial role in better understanding the impact of visual feedback gain on the motor control of precision grasping.


Subject(s)
Electroencephalography , Feedback, Sensory , Hand Strength , Psychomotor Performance , Humans , Feedback, Sensory/physiology , Hand Strength/physiology , Male , Young Adult , Adult , Female , Psychomotor Performance/physiology , Nerve Net/physiology , Healthy Volunteers , Algorithms , Brain/physiology
14.
Article in English | MEDLINE | ID: mdl-39102325

ABSTRACT

Hand function assessments in a clinical setting are critical for upper limb rehabilitation after spinal cord injury (SCI) but may not accurately reflect performance in an individual's home environment. When paired with computer vision models, egocentric videos from wearable cameras provide an opportunity for remote hand function assessment during real activities of daily living (ADLs). This study demonstrates the use of computer vision models to predict clinical hand function assessment scores from egocentric video. SlowFast, MViT, and MaskFeat models were trained and validated on a custom SCI dataset, which contained a variety of ADLs carried out in a simulated home environment. The dataset was annotated with clinical hand function assessment scores using an adapted scale applicable to a wide range of object interactions. An accuracy of 0.551±0.139, mean absolute error (MAE) of 0.517±0.184, and F1 score of 0.547±0.151 was achieved on the 5-class classification task. An accuracy of 0.724±0.135, MAE of 0.290±0.140, and F1 score of 0.733±0.144 was achieved on a consolidated 3-class classification task. This novel approach, for the first time, demonstrates the prediction of hand function assessment scores from egocentric video after SCI.


Subject(s)
Activities of Daily Living , Hand , Spinal Cord Injuries , Video Recording , Spinal Cord Injuries/rehabilitation , Spinal Cord Injuries/physiopathology , Humans , Hand/physiopathology , Male , Female , Adult , Reproducibility of Results , Middle Aged , Algorithms , Young Adult , Hand Strength/physiology , Wearable Electronic Devices
15.
Article in English | MEDLINE | ID: mdl-39102322

ABSTRACT

Cochlear implant (CI) is a neural prosthesis that can restore hearing for patients with severe to profound hearing loss. Observed variability in auditory rehabilitation outcomes following cochlear implantation may be due to cerebral reorganization. Electroencephalography (EEG), favored for its CI compatibility and non-invasiveness, has become a staple in clinical objective assessments of cerebral plasticity post-implantation. However, the electrical activity of CI distorts neural responses, and EEG susceptibility to these artifacts presents significant challenges in obtaining reliable neural responses. Despite the use of various artifact removal techniques in previous studies, the automatic identification and reduction of CI artifacts while minimizing information loss or damage remains a pressing issue in objectively assessing advanced auditory functions in CI recipients. To address this problem, we propose an approach that combines machine learning algorithms-specifically, Support Vector Machines (SVM)-along with Independent Component Analysis (ICA) and Ensemble Empirical Mode Decomposition (EEMD) to automatically detect and minimize electrical artifacts in EEG data. The innovation of this research is the automatic detection of CI artifacts using the temporal properties of EEG signals. By applying EEMD and ICA, we can process and remove the identified CI artifacts from the affected EEG channels, yielding a refined signal. Comparative analysis in the temporal, frequency, and spatial domains suggests that the corrected EEG recordings of CI recipients closely align with those of peers with normal hearing, signifying the restoration of reliable neural responses across the entire scalp while eliminating CI artifacts.


Subject(s)
Algorithms , Artifacts , Cochlear Implants , Electroencephalography , Support Vector Machine , Humans , Electroencephalography/methods , Male , Female , Adult , Middle Aged , Reproducibility of Results , Aged , Young Adult
16.
Sci Rep ; 14(1): 18243, 2024 08 06.
Article in English | MEDLINE | ID: mdl-39107347

ABSTRACT

Individual Specific Networks (ISNs) are a tool used in computational biology to infer Individual Specific relationships between biological entities from omics data. ISNs provide insights into how the interactions among these entities affect their respective functions. To address the scarcity of solutions for efficiently computing ISNs on large biological datasets, we present ISN-tractor, a data-agnostic, highly optimized Python library to build and analyse ISNs. ISN-tractor demonstrates superior scalability and efficiency in generating Individual Specific Networks (ISNs) when compared to existing methods such as LionessR, both in terms of time and memory usage, allowing ISNs to be used on large datasets. We show how ISN-tractor can be applied to real-life datasets, including The Cancer Genome Atlas (TCGA) and HapMap, showcasing its versatility. ISN-tractor can be used to build ISNs from various -omics data types, including transcriptomics, proteomics, and genotype arrays, and can detect distinct patterns of gene interactions within and across cancer types. We also show how Filtration Curves provided valuable insights into ISN characteristics, revealing topological distinctions among individuals with different clinical outcomes. Additionally, ISN-tractor can effectively cluster populations based on genetic relationships, as demonstrated with Principal Component Analysis on HapMap data.


Subject(s)
Computational Biology , Humans , Computational Biology/methods , Gene Regulatory Networks , Neoplasms/genetics , Software , Proteomics/methods , Algorithms
17.
Bioinformatics ; 40(8)2024 Aug 02.
Article in English | MEDLINE | ID: mdl-39107889

ABSTRACT

MOTIVATION: Transcription factors are pivotal in the regulation of gene expression, and accurate identification of transcription factor binding sites (TFBSs) at high resolution is crucial for understanding the mechanisms underlying gene regulation. The task of identifying TFBSs from DNA sequences is a significant challenge in the field of computational biology today. To address this challenge, a variety of computational approaches have been developed. However, these methods face limitations in their ability to achieve high-resolution identification and often lack interpretability. RESULTS: We propose BertSNR, an interpretable deep learning framework for identifying TFBSs at single-nucleotide resolution. BertSNR integrates sequence-level and token-level information by multi-task learning based on pre-trained DNA language models. Benchmarking comparisons show that our BertSNR outperforms the existing state-of-the-art methods in TFBS predictions. Importantly, we enhanced the interpretability of the model through attentional weight visualization and motif analysis, and discovered the subtle relationship between attention weight and motif. Moreover, BertSNR effectively identifies TFBSs in promoter regions, facilitating the study of intricate gene regulation. AVAILABILITY AND IMPLEMENTATION: The BertSNR source code can be found at https://github.com/lhy0322/BertSNR.


Subject(s)
Deep Learning , Transcription Factors , Transcription Factors/metabolism , Binding Sites , Computational Biology/methods , DNA/metabolism , DNA/chemistry , Sequence Analysis, DNA/methods , Software , Algorithms
18.
Skin Res Technol ; 30(8): e13783, 2024 Aug.
Article in English | MEDLINE | ID: mdl-39113617

ABSTRACT

BACKGROUND: In recent years, the increasing prevalence of skin cancers, particularly malignant melanoma, has become a major concern for public health. The development of accurate automated segmentation techniques for skin lesions holds immense potential in alleviating the burden on medical professionals. It is of substantial clinical importance for the early identification and intervention of skin cancer. Nevertheless, the irregular shape, uneven color, and noise interference of the skin lesions have presented significant challenges to the precise segmentation. Therefore, it is crucial to develop a high-precision and intelligent skin lesion segmentation framework for clinical treatment. METHODS: A precision-driven segmentation model for skin cancer images is proposed based on the Transformer U-Net, called BiADATU-Net, which integrates the deformable attention Transformer and bidirectional attention blocks into the U-Net. The encoder part utilizes deformable attention Transformer with dual attention block, allowing adaptive learning of global and local features. The decoder part incorporates specifically tailored scSE attention modules within skip connection layers to capture image-specific context information for strong feature fusion. Additionally, deformable convolution is aggregated into two different attention blocks to learn irregular lesion features for high-precision prediction. RESULTS: A series of experiments are conducted on four skin cancer image datasets (i.e., ISIC2016, ISIC2017, ISIC2018, and PH2). The findings show that our model exhibits satisfactory segmentation performance, all achieving an accuracy rate of over 96%. CONCLUSION: Our experiment results validate the proposed BiADATU-Net achieves competitive performance supremacy compared to some state-of-the-art methods. It is potential and valuable in the field of skin lesion segmentation.


Subject(s)
Melanoma , Skin Neoplasms , Humans , Skin Neoplasms/diagnostic imaging , Skin Neoplasms/pathology , Melanoma/diagnostic imaging , Melanoma/pathology , Algorithms , Neural Networks, Computer , Image Processing, Computer-Assisted/methods , Image Interpretation, Computer-Assisted/methods , Dermoscopy/methods , Deep Learning
19.
Gigascience ; 132024 Jan 02.
Article in English | MEDLINE | ID: mdl-39115958

ABSTRACT

BACKGROUND: Phylogenies play a crucial role in biological research. Unfortunately, the search for the optimal phylogenetic tree incurs significant computational costs, and most of the existing state-of-the-art tools cannot deal with extremely large datasets in reasonable times. RESULTS: In this work, we introduce the new VeryFastTree code (version 4.0), which is able to construct a tree on 1 server using single-precision arithmetic from a massive 1 million alignment dataset in only 36 hours, which is 3 times and 3.2 times faster than its previous version and FastTree-2, respectively. This new version further boosts performance by parallelizing all tree traversal operations during the tree construction process, including subtree pruning and regrafting moves. Additionally, it introduces significant new features such as support for new and compressed file formats, enhanced compatibility across a broader range of operating systems, and the integration of disk computing functionality. The latter feature is particularly advantageous for users without access to high-end servers, as it allows them to manage very large datasets, albeit with an increase in computing time. CONCLUSIONS: Experimental results establish VeryFastTree as the fastest tool in the state-of-the-art for maximum likelihood phylogeny estimation. It is publicly available at https://github.com/citiususc/veryfasttree. In addition, VeryFastTree is included as a package in Bioconda, MacPorts, and all Debian-based Linux distributions.


Subject(s)
Phylogeny , Software , Algorithms , Computational Biology/methods , Classification/methods , Databases, Genetic
20.
PLoS One ; 19(8): e0307319, 2024.
Article in English | MEDLINE | ID: mdl-39116090

ABSTRACT

The utilization of digital statistical processes in images and videos can effectively tackle numerous challenges encountered in optical sensors. This research endeavors to overcome the limitations inherent in traditional focus models, particularly their inadequate accuracy. It aims to bolster the precision of real-time perception and dynamic control by employing enhanced data fusion methodologies. The ultimate objective is to facilitate information services that enable seamless interaction and profound integration between computational and physical processes within an open environment. To achieve this, an enhanced sum-modulus difference (SMD) evaluation function has been proposed. This innovation is founded on the concept of threshold value evaluation, aimed at rectifying the accuracy shortcomings of traditional focusing models. Through the computation of each gray value after threshold segmentation, the method identifies the most suitable threshold for image segmentation. This identified threshold is then applied to the focus search strategy employing the radial basis function (RBF) algorithm. Furthermore, an intelligent focusing system has been developed on the Zynq development platform, encompassing both hardware design and software program development. The test results affirm that the focusing model based on the improved SMD evaluation function rapidly identifies the peak point of the gray variance curve, ascertains the optimal focal plane position, and notably enhances the sensitivity of the focusing model.


Subject(s)
Algorithms , Image Processing, Computer-Assisted , Image Processing, Computer-Assisted/methods , Software
SELECTION OF CITATIONS
SEARCH DETAIL