Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 20 de 319.008
Filter
1.
BMC Pediatr ; 24(1): 453, 2024 Jul 15.
Article in English | MEDLINE | ID: mdl-39009988

ABSTRACT

BACKGROUND: Oral feeding is a complex sensorimotor process influenced by many variables, making it challenging for healthcare providers to introduce and manage it. Feeding practice guided by tradition or a trial-and-error approach may be inconsistent and potentially delay the progression of oral feeding skills. AIM: To apply a new feeding approach that assesses early oral feeding independence skills of preterm infants in the neonatal intensive care unit (NICU). To prove its effectiveness, compare two approaches of oral feeding progression based on clinical outcomes in preterm infants, the traditional approach used in the NICU of Mansoura University Children Hospital (MUCH) versus the newly applied approach. METHODS: A quasi-experimental, exploratory, and analytical design was employed using two groups, control and intervention groups, with 40 infants for the first group and 41 infants for the second one. The first group (the control) was done first and included observation of the standard practice in the NICU of MUCH for preterm oral feeding, in which oral feeding was dependent on post-menstrual age (PMA) and weight for four months. The second group (the intervention) included early progression to oral feeding depending on early assessment of Oral Feeding Skills (OFS) and early supportive intervention and/or feeding therapy if needed using the newly developed scoring system, the Mansoura Early Feeding Skills Assessment "MEFSA" for the other four months. Infants in both groups were studied from the day of admission till discharge. RESULTS: In addition to age and weight criteria, other indicators for oral feeding readiness and oral motor skills were respected, such as oral feeding readiness cues, feeding practice, feeding maintenance, and feeding techniques. By following this approach, preterm infants achieved earlier start oral feeding (SOF) and full oral feeding (FOF) and were discharged with shorter periods of tube feeding. Infants gained weight without increasing their workload to the NICU team. CONCLUSION: The newly applied approach proved to be a successful bedside scoring system scale for assessing preterm infants' early oral feeding independence skills in the NICU. It offers an early individualized experience of oral feeding without clinical complications.


Subject(s)
Algorithms , Enteral Nutrition , Infant, Premature , Intensive Care Units, Neonatal , Humans , Infant, Newborn , Enteral Nutrition/methods , Case-Control Studies , Female , Male , Bottle Feeding , Feeding Behavior
2.
BMC Public Health ; 24(1): 1880, 2024 Jul 15.
Article in English | MEDLINE | ID: mdl-39009998

ABSTRACT

The following article presents an analysis of the impact of the Environmental, Social and Governance-ESG determinants on Hospital Emigration to Another Region-HEAR in the Italian regions in the period 2004-2021. The data are analysed using Panel Data with Random Effects, Panel Data with Fixed Effects, Pooled Ordinary Least Squares-OLS, Weighted Least Squares-WLS, and Dynamic Panel at 1 Stage. Furthermore, to control endogeneity we also created instrumental variable models for each component of the ESG model. Results show that HEAR is negatively associated to the E, S and G component within the ESG model. The data were subjected to clustering with a k-Means algorithm optimized with the Silhouette coefficient. The optimal clustering with k=2 is compared to the sub-optimal cluster with k=3. The results suggest a negative relationship between the resident population and hospital emigration at regional level. Finally, a prediction is proposed with machine learning algorithms classified based on statistical performance. The results show that the Artificial Neural Network-ANN algorithm is the best predictor. The ANN predictions are critically analyzed in light of health economic policy directions.


Subject(s)
Hospitals , Italy , Humans , Hospitals/statistics & numerical data , Neural Networks, Computer , Emigration and Immigration/statistics & numerical data , Algorithms , Environment , Cluster Analysis
3.
PLoS One ; 19(7): e0304575, 2024.
Article in English | MEDLINE | ID: mdl-39012860

ABSTRACT

In this research paper, we investigate the existence and uniqueness of solutions for neutral functional differential equations with sequential fractional orders, specifically involving the [Formula: see text]-Caputo operator. To obtain the desired results, we employ the Banach fixed point theorem (BFPT), a nonlinear variation of the Leray-Schauder fixed point theorem (SFPT), and the Krasnoselski fixed point theorem (KFPT). Additionally, we provide illustrative examples that demonstrate the key findings. Furthermore, we address a scenario where an initial value integral condition is considered.


Subject(s)
Algorithms , Models, Theoretical
4.
PLoS One ; 19(7): e0305470, 2024.
Article in English | MEDLINE | ID: mdl-39012872

ABSTRACT

The method of partial differential equations for image inpainting achieves better repair results and is economically feasible with fast repair time. Addresses the inability of Curvature-Driven Diffusion (CDD) models to repair complex textures or edges when the input image is affected by severe noise or distortion, resulting in discontinuous repair features, blurred detail textures, and an inability to deal with the consistency of global image content, In this paper, we have the CDD model of P-Laplace operator term to image inpainting. In this method, the P-Laplace operator is firstly introduced into the diffusion term of CDD model to regulate the diffusion speed; then the improved CDD model is discretized, and the known information around the broken region is divided into two weighted average iterations to get the inpainting image; finally, the final inpainting image is obtained by weighted averaging the two image inpainting images according to the distancing. Experiments show that the model restoration results in this paper are more rational in terms of texture structure and outperform other models in terms of visualization and objective data. Comparing the inpainting images with 150, 1000 and 100 iterations respectively, Total Variation(TV) model and the CDD model inpainting algorithm always has inpainting traces in details, and TV model can't meet the visual connectivity, but the algorithm in this paper can remove the inpainting traces well, TV model and the CDD model inpainting algorithm always have inpainting traces in details, and TV model can't meet the visual connectivity, but the algorithm in this paper can remove the inpainting traces well. Of the images used for testing, the highest PSNR reached 38.7982, SSIM reached 0.9407, and FSIM reached 0.9781, the algorithm not only inpainting the effect and, but also has fewer iterations.


Subject(s)
Algorithms , Image Processing, Computer-Assisted/methods , Models, Theoretical , Diffusion
5.
PLoS One ; 19(7): e0301692, 2024.
Article in English | MEDLINE | ID: mdl-39012881

ABSTRACT

Speech enhancement is crucial both for human and machine listening applications. Over the last decade, the use of deep learning for speech enhancement has resulted in tremendous improvement over the classical signal processing and machine learning methods. However, training a deep neural network is not only time-consuming; it also requires extensive computational resources and a large training dataset. Transfer learning, i.e. using a pretrained network for a new task, comes to the rescue by reducing the amount of training time, computational resources, and the required dataset, but the network still needs to be fine-tuned for the new task. This paper presents a novel method of speech denoising and dereverberation (SD&D) on an end-to-end frozen binaural anechoic speech separation network. The frozen network requires neither any architectural change nor any fine-tuning for the new task, as is usually required for transfer learning. The interaural cues of a source placed inside noisy and echoic surroundings are given as input to this pretrained network to extract the target speech from noise and reverberation. Although the pretrained model used in this paper has never seen noisy reverberant conditions during its training, it performs satisfactorily for zero-shot testing (ZST) under these conditions. It is because the pretrained model used here has been trained on the direct-path interaural cues of an active source and so it can recognize them even in the presence of echoes and noise. ZST on the same dataset on which the pretrained network was trained (homo-corpus) for the unseen class of interference, has shown considerable improvement over the weighted prediction error (WPE) algorithm in terms of four objective speech quality and intelligibility metrics. Also, the proposed model offers similar performance provided by a deep learning SD&D algorithm for this dataset under varying conditions of noise and reverberations. Similarly, ZST on a different dataset has provided an improvement in intelligibility and almost equivalent quality as provided by the WPE algorithm.


Subject(s)
Noise , Humans , Speech , Deep Learning , Signal-To-Noise Ratio , Neural Networks, Computer , Speech Perception/physiology , Algorithms , Signal Processing, Computer-Assisted
6.
PLoS One ; 19(7): e0297855, 2024.
Article in English | MEDLINE | ID: mdl-39012885

ABSTRACT

When large-scale electric vehicles are connected to the grid for unordered charging, it will seriously affect the stability and security of the power system. To solve this problem, this paper proposes a regional power network optimization scheduling method considering vehicle network interaction. Initially, based on the user behavior characteristics and charging and discharging characteristics of electric vehicles, a charging and discharging behavior model of electric vehicles was established. Based on the Monte Carlo sampling algorithm, the scheduling upper and lower limits of each scheduling cycle of electric vehicles were described, and the scheduling potential of each scheduling cycle of electric vehicles was obtained. Then, the electricity price is then used as an incentive parameter to guide EV users to charge during periods of low electricity prices and participate in discharge during periods of peak electricity prices. Aiming at the highest economic efficiency, the best consumption effect of new energy and the smoothest demand-side power curve of regional power grid, a three-objective optimal dispatching model was established. In the later stage, uncertainty factors are taken into consideration by introducing the concept of interval numbers, and an interval multi-objective optimization dispatching model is established. The two dispatching models are solved by NSGA-II algorithm and improved NSGA-II algorithm, and the Pareto solution set is obtained. Finally, based on the analytic Hierarchy Process (AHP), the optimal scheduling scheme is determined. The Monte Carlo sampling method is used to simulate the user side charging demand, and the effectiveness of this method is verified. In addition, the results of the interval multi-objective optimization model and the deterministic multi-objective optimization model are compared, and it is proved that the solution results of the interval multi-objective model are more adaptive, practical and robust to the uncertain factors.


Subject(s)
Algorithms , Monte Carlo Method , Electricity , Models, Theoretical , Electric Power Supplies
7.
PLoS One ; 19(7): e0307288, 2024.
Article in English | MEDLINE | ID: mdl-39012921

ABSTRACT

Feature selection is an important solution for dealing with high-dimensional data in the fields of machine learning and data mining. In this paper, we present an improved mountain gazelle optimizer (IMGO) based on the newly proposed mountain gazelle optimizer (MGO) and design a binary version of IMGO (BIMGO) to solve the feature selection problem for medical data. First, the gazelle population is initialized using iterative chaotic map with infinite collapses (ICMIC) mapping, which increases the diversity of the population. Second, a nonlinear control factor is introduced to balance the exploration and exploitation components of the algorithm. Individuals in the population are perturbed using a spiral perturbation mechanism to enhance the local search capability of the algorithm. Finally, a neighborhood search strategy is used for the optimal individuals to enhance the exploitation and convergence capabilities of the algorithm. The superior ability of the IMGO algorithm to solve continuous problems is demonstrated on 23 benchmark datasets. Then, BIMGO is evaluated on 16 medical datasets of different dimensions and compared with 8 well-known metaheuristic algorithms. The experimental results indicate that BIMGO outperforms the competing algorithms in terms of the fitness value, number of selected features and sensitivity. In addition, the statistical results of the experiments demonstrate the significantly superior ability of BIMGO to select the most effective features in medical datasets.


Subject(s)
Algorithms , Animals , Antelopes , Machine Learning , Humans , Data Mining/methods
8.
Cell Rep Methods ; 4(7): 100813, 2024 Jul 15.
Article in English | MEDLINE | ID: mdl-38971150

ABSTRACT

Gene co-expression analysis of single-cell transcriptomes, aiming to define functional relationships between genes, is challenging due to excessive dropout values. Here, we developed a single-cell graphical Gaussian model (SingleCellGGM) algorithm to conduct single-cell gene co-expression network analysis. When applied to mouse single-cell datasets, SingleCellGGM constructed networks from which gene co-expression modules with highly significant functional enrichment were identified. We considered the modules as gene expression programs (GEPs). These GEPs enable direct cell-type annotation of individual cells without cell clustering, and they are enriched with genes required for the functions of the corresponding cells, sometimes at levels greater than 10-fold. The GEPs are conserved across datasets and enable universal cell-type label transfer across different studies. We also proposed a dimension-reduction method through averaging by GEPs for single-cell analysis, enhancing the interpretability of results. Thus, SingleCellGGM offers a unique GEP-based perspective to analyze single-cell transcriptomes and reveals biological insights shared by different single-cell datasets.


Subject(s)
Algorithms , Gene Expression Profiling , Single-Cell Analysis , Transcriptome , Single-Cell Analysis/methods , Animals , Mice , Transcriptome/genetics , Gene Expression Profiling/methods , Gene Regulatory Networks/genetics
9.
J Comput Biol ; 31(7): 691-702, 2024 Jul.
Article in English | MEDLINE | ID: mdl-38979621

ABSTRACT

Proteins are essential to life, and understanding their intrinsic roles requires determining their structure. The field of proteomics has opened up new opportunities by applying deep learning algorithms to large databases of solved protein structures. With the availability of large data sets and advanced machine learning methods, the prediction of protein residue interactions has greatly improved. Protein contact maps provide empirical evidence of the interacting residue pairs within a protein sequence. Template-free protein structure prediction systems rely heavily on this information. This article proposes UNet-CON, an attention-integrated UNet architecture, trained to predict residue-residue contacts in protein sequences. With the predicted contacts being more accurate than state-of-the-art methods on the PDB25 test set, the model paves the way for the development of more powerful deep learning algorithms for predicting protein residue interactions.


Subject(s)
Algorithms , Computational Biology , Databases, Protein , Proteins , Proteins/chemistry , Proteins/genetics , Computational Biology/methods , Deep Learning , Protein Conformation , Models, Molecular , Machine Learning
10.
J Comput Biol ; 31(7): 597-615, 2024 Jul.
Article in English | MEDLINE | ID: mdl-38980804

ABSTRACT

Most sequence sketching methods work by selecting specific k-mers from sequences so that the similarity between two sequences can be estimated using only the sketches. Because estimating sequence similarity is much faster using sketches than using sequence alignment, sketching methods are used to reduce the computational requirements of computational biology software. Applications using sketches often rely on properties of the k-mer selection procedure to ensure that using a sketch does not degrade the quality of the results compared with using sequence alignment. Two important examples of such properties are locality and window guarantees, the latter of which ensures that no long region of the sequence goes unrepresented in the sketch. A sketching method with a window guarantee, implicitly or explicitly, corresponds to a decycling set of the de Bruijn graph, which is a set of unavoidable k-mers. Any long enough sequence, by definition, must contain a k-mer from any decycling set (hence, the unavoidable property). Conversely, a decycling set also defines a sketching method by choosing the k-mers from the set as representatives. Although current methods use one of a small number of sketching method families, the space of decycling sets is much larger and largely unexplored. Finding decycling sets with desirable characteristics (e.g., small remaining path length) is a promising approach to discovering new sketching methods with improved performance (e.g., with small window guarantee). The Minimum Decycling Sets (MDSs) are of particular interest because of their minimum size. Only two algorithms, by Mykkeltveit and Champarnaud, are previously known to generate two particular MDSs, although there are typically a vast number of alternative MDSs. We provide a simple method to enumerate MDSs. This method allows one to explore the space of MDSs and to find MDSs optimized for desirable properties. We give evidence that the Mykkeltveit sets are close to optimal regarding one particular property, the remaining path length. A number of conjectures and computational and theoretical evidence to support them are presented. Code available at https://github.com/Kingsford-Group/mdsscope.


Subject(s)
Algorithms , Computational Biology , Software , Computational Biology/methods , Sequence Alignment/methods , Humans , Sequence Analysis, DNA/methods
11.
J Comput Biol ; 31(7): 638-650, 2024 Jul.
Article in English | MEDLINE | ID: mdl-38985743

ABSTRACT

Discrete optimization problems arise in many biological contexts and, in many cases, we seek to make inferences from the optimal solutions. However, the number of optimal solutions is frequently very large and making inferences from any single solution may result in conclusions that are not supported by other optimal solutions. We describe a general approach for efficiently (polynomial time) and exactly (without sampling) computing statistics on the space of optimal solutions. These statistics provide insights into the space of optimal solutions that can be used to support the use of a single optimum (e.g., when the optimal solutions are similar) or justify the need for selecting multiple optima (e.g., when the solution space is large and diverse) from which to make inferences. We demonstrate this approach on two well-known problems and identify the properties of these problems that make them amenable to this method.


Subject(s)
Algorithms , Computational Biology/methods , Computer Simulation
12.
J Comput Biol ; 31(7): 616-637, 2024 Jul.
Article in English | MEDLINE | ID: mdl-38990757

ABSTRACT

Modern genomic datasets, like those generated under the 1000 Genome Project, contain millions of variants belonging to known haplotypes. Although these datasets are more representative than a single reference sequence and can alleviate issues like reference bias, they are significantly more computationally burdensome to work with, often involving large-indexed genome graph data structures for tasks such as read mapping. The construction, preprocessing, and mapping algorithms can require substantial computational resources depending on the size of these variant sets. Moreover, the accuracy of mapping algorithms has been shown to decrease when working with complete variant sets. Therefore, a drastically reduced set of variants that preserves important properties of the original set is desirable. This work provides a technique for finding a minimal subset of variants S such that for given parameters α and δ, all substrings up to length α in the haplotypes are guaranteed to be still alignable to the appropriate locations with either Hamming or edit distance at most δ, using only S. Our contributions include showing the NP-hardness and inapproximability of these optimization problems and providing Integer Linear Programming (ILP) formulations. Our edit distance ILP formulation carefully decomposes the problem according to variant locations, which allows it to scale to support all of chromosome 22's variants from the 1000 Genome Project. Our experiments also demonstrate a significant reduction in the number of variants. For example, for moderately long reads, e.g., α = 1000, over 75% of the variants can be removed while preserving read mappability with edit distance at most one.


Subject(s)
Algorithms , Haplotypes , Humans , Computational Biology/methods , Genomics/methods , Genome, Human , Software , Genetic Variation , Sequence Analysis, DNA/methods
13.
Sci Rep ; 14(1): 16231, 2024 Jul 14.
Article in English | MEDLINE | ID: mdl-39004625

ABSTRACT

Generative AI tools exemplified by ChatGPT are becoming a new reality. This study is motivated by the premise that "AI generated content may exhibit a distinctive behavior that can be separated from scientific articles". In this study, we show how articles can be generated using means of prompt engineering for various diseases and conditions. We then show how we tested this premise in two phases and prove its validity. Subsequently, we introduce xFakeSci, a novel learning algorithm, that is capable of distinguishing ChatGPT-generated articles from publications produced by scientists. The algorithm is trained using network models driven from both sources. To mitigate overfitting issues, we incorporated a calibration step that is built upon data-driven heuristics, including proximity and ratios. Specifically, from a total of a 3952 fake articles for three different medical conditions, the algorithm was trained using only 100 articles, but calibrated using folds of 100 articles. As for the classification step, it was performed using 300 articles per condition. The actual label steps took place against an equal mix of 50 generated articles and 50 authentic PubMed abstracts. The testing also spanned publication periods from 2010 to 2024 and encompassed research on three distinct diseases: cancer, depression, and Alzheimer's. Further, we evaluated the accuracy of the xFakeSci algorithm against some of the classical data mining algorithms (e.g., Support Vector Machines, Regression, and Naive Bayes). The xFakeSci algorithm achieved F1 scores ranging from 80 to 94%, outperforming common data mining algorithms, which scored F1 values between 38 and 52%. We attribute the noticeable difference to the introduction of calibration and a proximity distance heuristic, which underscores this promising performance. Indeed, the prediction of fake science generated by ChatGPT presents a considerable challenge. Nonetheless, the introduction of the xFakeSci algorithm is a significant step on the way to combating fake science.


Subject(s)
Algorithms , Humans , Artificial Intelligence , Machine Learning , Publications
14.
Sci Rep ; 14(1): 16239, 2024 Jul 14.
Article in English | MEDLINE | ID: mdl-39004643

ABSTRACT

Aiming to apply automatic arousal detection to support sleep laboratories, we evaluated an optimized, state-of-the-art approach using data from daily work in our university hospital sleep laboratory. Therefore, a machine learning algorithm was trained and evaluated on 3423 polysomnograms of people with various sleep disorders. The model architecture is a U-net that accepts 50 Hz signals as input. We compared this algorithm with models trained on publicly available datasets, and evaluated these models using our clinical dataset, particularly with regard to the effects of different sleep disorders. In an effort to evaluate clinical relevance, we designed a metric based on the error of the predicted arousal index. Our models achieve an area under the precision recall curve (AUPRC) of up to 0.83 and F1 scores of up to 0.81. The model trained on our data showed no age or gender bias and no significant negative effect regarding sleep disorders on model performance compared to healthy sleep. In contrast, models trained on public datasets showed a small to moderate negative effect (calculated using Cohen's d) of sleep disorders on model performance. Therefore, we conclude that state-of-the-art arousal detection on our clinical data is possible with our model architecture. Thus, our results support the general recommendation to use a clinical dataset for training if the model is to be applied to clinical data.


Subject(s)
Arousal , Machine Learning , Polysomnography , Sleep Wake Disorders , Sleep , Humans , Arousal/physiology , Polysomnography/methods , Female , Male , Middle Aged , Sleep Wake Disorders/diagnosis , Sleep Wake Disorders/physiopathology , Adult , Sleep/physiology , Algorithms , Aged
15.
Biometrics ; 80(3)2024 Jul 01.
Article in English | MEDLINE | ID: mdl-39005072

ABSTRACT

The increasing availability and scale of biobanks and "omic" datasets bring new horizons for understanding biological mechanisms. PathGPS is an exploratory data analysis tool to discover genetic architectures using Genome Wide Association Studies (GWAS) summary data. PathGPS is based on a linear structural equation model where traits are regulated by both genetic and environmental pathways. PathGPS decouples the genetic and environmental components by contrasting the GWAS associations of "signal" genes with those of "noise" genes. From the estimated genetic component, PathGPS then extracts genetic pathways via principal component and factor analysis, leveraging the low-rank and sparse properties. In addition, we provide a bootstrap aggregating ("bagging") algorithm to improve stability under data perturbation and hyperparameter tuning. When applied to a metabolomics dataset and the UK Biobank, PathGPS confirms several known gene-trait clusters and suggests multiple new hypotheses for future investigations.


Subject(s)
Algorithms , Genome-Wide Association Study , Genome-Wide Association Study/statistics & numerical data , Humans , Metabolomics/methods , Principal Component Analysis , Models, Genetic , Polymorphism, Single Nucleotide , Biological Specimen Banks , Computer Simulation , Models, Statistical
16.
Brief Bioinform ; 25(4)2024 May 23.
Article in English | MEDLINE | ID: mdl-39007597

ABSTRACT

Thyroid cancer incidences endure to increase even though a large number of inspection tools have been developed recently. Since there is no standard and certain procedure to follow for the thyroid cancer diagnoses, clinicians require conducting various tests. This scrutiny process yields multi-dimensional big data and lack of a common approach leads to randomly distributed missing (sparse) data, which are both formidable challenges for the machine learning algorithms. This paper aims to develop an accurate and computationally efficient deep learning algorithm to diagnose the thyroid cancer. In this respect, randomly distributed missing data stemmed singularity in learning problems is treated and dimensionality reduction with inner and target similarity approaches are developed to select the most informative input datasets. In addition, size reduction with the hierarchical clustering algorithm is performed to eliminate the considerably similar data samples. Four machine learning algorithms are trained and also tested with the unseen data to validate their generalization and robustness abilities. The results yield 100% training and 83% testing preciseness for the unseen data. Computational time efficiencies of the algorithms are also examined under the equal conditions.


Subject(s)
Algorithms , Deep Learning , Thyroid Neoplasms , Thyroid Neoplasms/diagnosis , Humans , Machine Learning , Cluster Analysis
17.
Brief Bioinform ; 25(4)2024 May 23.
Article in English | MEDLINE | ID: mdl-39007599

ABSTRACT

The interaction between T-cell receptors (TCRs) and peptides (epitopes) presented by major histocompatibility complex molecules (MHC) is fundamental to the immune response. Accurate prediction of TCR-epitope interactions is crucial for advancing the understanding of various diseases and their prevention and treatment. Existing methods primarily rely on sequence-based approaches, overlooking the inherent topology structure of TCR-epitope interaction networks. In this study, we present $GTE$, a novel heterogeneous Graph neural network model based on inductive learning to capture the topological structure between TCRs and Epitopes. Furthermore, we address the challenge of constructing negative samples within the graph by proposing a dynamic edge update strategy, enhancing model learning with the nonbinding TCR-epitope pairs. Additionally, to overcome data imbalance, we adapt the Deep AUC Maximization strategy to the graph domain. Extensive experiments are conducted on four public datasets to demonstrate the superiority of exploring underlying topological structures in predicting TCR-epitope interactions, illustrating the benefits of delving into complex molecular networks. The implementation code and data are available at https://github.com/uta-smile/GTE.


Subject(s)
Receptors, Antigen, T-Cell , Receptors, Antigen, T-Cell/chemistry , Receptors, Antigen, T-Cell/immunology , Receptors, Antigen, T-Cell/metabolism , Humans , Epitopes, T-Lymphocyte/immunology , Epitopes, T-Lymphocyte/chemistry , Neural Networks, Computer , Computational Biology/methods , Protein Binding , Epitopes/chemistry , Epitopes/immunology , Algorithms , Software
18.
Brief Bioinform ; 25(4)2024 May 23.
Article in English | MEDLINE | ID: mdl-39007596

ABSTRACT

Biclustering, the simultaneous clustering of rows and columns of a data matrix, has proved its effectiveness in bioinformatics due to its capacity to produce local instead of global models, evolving from a key technique used in gene expression data analysis into one of the most used approaches for pattern discovery and identification of biological modules, used in both descriptive and predictive learning tasks. This survey presents a comprehensive overview of biclustering. It proposes an updated taxonomy for its fundamental components (bicluster, biclustering solution, biclustering algorithms, and evaluation measures) and applications. We unify scattered concepts in the literature with new definitions to accommodate the diversity of data types (such as tabular, network, and time series data) and the specificities of biological and biomedical data domains. We further propose a pipeline for biclustering data analysis and discuss practical aspects of incorporating biclustering in real-world applications. We highlight prominent application domains, particularly in bioinformatics, and identify typical biclusters to illustrate the analysis output. Moreover, we discuss important aspects to consider when choosing, applying, and evaluating a biclustering algorithm. We also relate biclustering with other data mining tasks (clustering, pattern mining, classification, triclustering, N-way clustering, and graph mining). Thus, it provides theoretical and practical guidance on biclustering data analysis, demonstrating its potential to uncover actionable insights from complex datasets.


Subject(s)
Algorithms , Computational Biology , Cluster Analysis , Computational Biology/methods , Gene Expression Profiling/methods , Gene Expression Profiling/statistics & numerical data , Humans
19.
PLoS One ; 19(7): e0298564, 2024.
Article in English | MEDLINE | ID: mdl-39008464

ABSTRACT

High-quality, chromosome-scale genomes are essential for genomic analyses. Analyses, including 3D genomics, epigenetics, and comparative genomics rely on a high-quality genome assembly, which is often accomplished with the assistance of Hi-C data. Curation of genomes reveal that current Hi-C-assisted scaffolding algorithms either generate ordering and orientation errors or fail to assemble high-quality chromosome-level scaffolds. Here, we offer the software Puzzle Hi-C, which uses Hi-C reads to accurately assign contigs or scaffolds to chromosomes. Puzzle Hi-C uses the triangle region instead of the square region to count interactions in a Hi-C heatmap. This strategy dramatically diminishes scaffolding interference caused by long-range interactions. This software also introduces a dynamic, triangle window strategy during assembly. Initially small, the window expands with interactions to produce more effective clustering. Puzzle Hi-C outperforms available scaffolding tools.


Subject(s)
Algorithms , Genomics , Software , Genomics/methods , Chromosomes/genetics , Humans , Genome
20.
PLoS One ; 19(7): e0307027, 2024.
Article in English | MEDLINE | ID: mdl-39008472

ABSTRACT

The rise of social media has changed how people view connections. Machine Learning (ML)-based sentiment analysis and news categorization help understand emotions and access news. However, most studies focus on complex models requiring heavy resources and slowing inference times, making deployment difficult in resource-limited environments. In this paper, we process both structured and unstructured data, determining the polarity of text using the TextBlob scheme to determine the sentiment of news headlines. We propose a Stochastic Gradient Descent (SGD)-based Ridge classifier (RC) for blending SGDR with an advanced string processing technique to effectively classify news articles. Additionally, we explore existing supervised and unsupervised ML algorithms to gauge the effectiveness of our SGDR classifier. The scalability and generalization capability of SGD and L2 regularization techniques in RCs to handle overfitting and balance bias and variance provide the proposed SGDR with better classification capability. Experimental results highlight that our string processing pipeline significantly boosts the performance of all ML models. Notably, our ensemble SGDR classifier surpasses all state-of-the-art ML algorithms, achieving an impressive 98.12% accuracy. McNemar's significance tests reveal that our SGDR classifier achieves a 1% significance level improvement over K-Nearest Neighbor, Decision Tree, and AdaBoost and a 5% significance level improvement over other algorithms. These findings underscore the superior proficiency of linear models in news categorization compared to tree-based and nonlinear counterparts. This study contributes valuable insights into the efficacy of the proposed methodology, elucidating its potential for news categorization and sentiment analysis.


Subject(s)
Algorithms , Machine Learning , Social Media , Humans , Emotions
SELECTION OF CITATIONS
SEARCH DETAIL
...