Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 185
Filtrar
1.
Neurosci Biobehav Rev ; : 105846, 2024 Aug 06.
Artigo em Inglês | MEDLINE | ID: mdl-39117132

RESUMO

The large number of different analytical choices used by researchers is partly responsible for the challenge of replication in neuroimaging studies. For an exhaustive robustness analysis, knowledge of the full space of analytical options is essential. We conducted a systematic literature review to identify the analytical decisions in functional neuroimaging data preprocessing and analysis in the emerging field of cognitive network neuroscience. We found 61 different steps, with 17 of them having debatable parameter choices. Scrubbing, global signal regression, and spatial smoothing are among the controversial steps. There is no standardized order in which different steps are applied, and the parameter settings within several steps vary widely across studies. By aggregating the pipelines across studies, we propose three taxonomic levels to categorize analytical choices: 1) inclusion or exclusion of specific steps, 2) parameter tuning within steps, and 3) distinct sequencing of steps. We have developed a decision support application with high educational value called METEOR to facilitate access to the data in order to design well-informed robustness (multiverse) analysis.

2.
BMC Public Health ; 24(1): 1777, 2024 Jul 03.
Artigo em Inglês | MEDLINE | ID: mdl-38961394

RESUMO

BACKGROUND: Dyslipidemia, characterized by variations in plasma lipid profiles, poses a global health threat linked to millions of deaths annually. OBJECTIVES: This study focuses on predicting dyslipidemia incidence using machine learning methods, addressing the crucial need for early identification and intervention. METHODS: The dataset, derived from the Lifestyle Promotion Project (LPP) in East Azerbaijan Province, Iran, undergoes a comprehensive preprocessing, merging, and null handling process. Target selection involves five distinct dyslipidemia-related variables. Normalization techniques and three feature selection algorithms are applied to enhance predictive modeling. RESULT: The study results underscore the potential of different machine learning algorithms, specifically multi-layer perceptron neural network (MLP), in reaching higher performance metrics such as accuracy, F1 score, sensitivity and specificity, among other machine learning methods. Among other algorithms, Random Forest also showed remarkable accuracies and outperformed K-Nearest Neighbors (KNN) in metrics like precision, recall, and F1 score. The study's emphasis on feature selection detected meaningful patterns among five target variables related to dyslipidemia, indicating fundamental shared unities among dyslipidemia-related factors. Features such as waist circumference, serum vitamin D, blood pressure, sex, age, diabetes, and physical activity related to dyslipidemia. CONCLUSION: These results cooperatively highlight the complex nature of dyslipidemia and its connections with numerous factors, strengthening the importance of applying machine learning methods to understand and predict its incidence precisely.


Assuntos
Dislipidemias , Aprendizado de Máquina , Humanos , Dislipidemias/epidemiologia , Incidência , Irã (Geográfico)/epidemiologia , Masculino , Feminino , Estilo de Vida , Algoritmos , Promoção da Saúde/métodos , Pessoa de Meia-Idade , Adulto
3.
Methods Mol Biol ; 2836: 135-155, 2024.
Artigo em Inglês | MEDLINE | ID: mdl-38995540

RESUMO

The increasing complexity and volume of mass spectrometry (MS) data have presented new challenges and opportunities for proteomics data analysis and interpretation. In this chapter, we provide a comprehensive guide to transforming MS data for machine learning (ML) training, inference, and applications. The chapter is organized into three parts. The first part describes the data analysis needed for MS-based experiments and a general introduction to our deep learning model SpeCollate-which we will use throughout the chapter for illustration. The second part of the chapter explores the transformation of MS data for inference, providing a step-by-step guide for users to deduce peptides from their MS data. This section aims to bridge the gap between data acquisition and practical applications by detailing the necessary steps for data preparation and interpretation. In the final part, we present a demonstrative example of SpeCollate, a deep learning-based peptide database search engine that overcomes the problems of simplistic simulation of theoretical spectra and heuristic scoring functions for peptide-spectrum matches by generating joint embeddings for spectra and peptides. SpeCollate is a user-friendly tool with an intuitive command-line interface to perform the search, showcasing the effectiveness of the techniques and methodologies discussed in the earlier sections and highlighting the potential of machine learning in the context of mass spectrometry data analysis. By offering a comprehensive overview of data transformation, inference, and ML model applications for mass spectrometry, this chapter aims to empower researchers and practitioners in leveraging the power of machine learning to unlock novel insights and drive innovation in the field of mass spectrometry-based omics.


Assuntos
Espectrometria de Massas , Proteômica , Software , Proteômica/métodos , Espectrometria de Massas/métodos , Aprendizado de Máquina , Peptídeos/química , Humanos , Bases de Dados de Proteínas , Aprendizado Profundo , Ferramenta de Busca , Biologia Computacional/métodos , Algoritmos
4.
Biotechnol Bioeng ; 2024 Jul 23.
Artigo em Inglês | MEDLINE | ID: mdl-39044472

RESUMO

In the burgeoning field of proteins, the effective analysis of intricate protein data remains a formidable challenge, necessitating advanced computational tools for data processing, feature extraction, and interpretation. This study introduces ProteinFlow, an innovative framework designed to revolutionize feature engineering in protein data analysis. ProteinFlow stands out by offering enhanced efficiency in data collection and preprocessing, along with advanced capabilities in feature extraction, directly addressing the complexities inherent in multidimensional protein data sets. Through a comparative analysis, ProteinFlow demonstrated a significant improvement over traditional methods, notably reducing data preprocessing time and expanding the scope of biologically significant features identified. The framework's parallel data processing strategy and advanced algorithms ensure not only rapid data handling but also the extraction of comprehensive, meaningful insights from protein sequences, structures, and interactions. Furthermore, ProteinFlow exhibits remarkable scalability, adeptly managing large-scale data sets without compromising performance, a crucial attribute in the era of big data.

5.
Environ Monit Assess ; 196(8): 724, 2024 Jul 11.
Artigo em Inglês | MEDLINE | ID: mdl-38990407

RESUMO

Analysis of the change in groundwater used as a drinking and irrigation water source is of critical importance in terms of monitoring aquifers, planning water resources, energy production, combating climate change, and agricultural production. Therefore, it is necessary to model groundwater level (GWL) fluctuations to monitor and predict groundwater storage. Artificial intelligence-based models in water resource management have become prevalent due to their proven success in hydrological studies. This study proposed a hybrid model that combines the artificial neural network (ANN) and the artificial bee colony optimization (ABC) algorithm, along with the ensemble empirical mode decomposition (EEMD) and the local mean decomposition (LMD) techniques, to model groundwater levels in Erzurum province, Türkiye. GWL estimation results were evaluated with mean square error (MSE), coefficient of determination (R2), and residual sum of squares (RSS) and visually with violin, scatter, and time series plot. The study results indicated that the EEMD-ABC-ANN hybrid model was superior to other models in estimating GWL, with R2 values ranging from 0.91 to 0.99 and MSE values ranging from 0.004 to 0.07. It has also been revealed that promising GWL predictions can be made with previous GWL data.


Assuntos
Monitoramento Ambiental , Água Subterrânea , Redes Neurais de Computação , Água Subterrânea/química , Abelhas , Animais , Monitoramento Ambiental/métodos , Algoritmos
6.
Heliyon ; 10(12): e33328, 2024 Jun 30.
Artigo em Inglês | MEDLINE | ID: mdl-39021980

RESUMO

This review paper addresses the critical need for advanced rice disease detection methods by integrating artificial intelligence, specifically convolutional neural networks (CNNs). Rice, being a staple food for a large part of the global population, is susceptible to various diseases that threaten food security and agricultural sustainability. This research is significant as it leverages technological advancements to tackle these challenges effectively. Drawing upon diverse datasets collected across regions including India, Bangladesh, Türkiye, China, and Pakistan, this paper offers a comprehensive analysis of global research efforts in rice disease detection using CNNs. While some rice diseases are universally prevalent, many vary significantly by growing region due to differences in climate, soil conditions, and agricultural practices. The primary objective is to explore the application of AI, particularly CNNs, for precise and early identification of rice diseases. The literature review includes a detailed examination of data sources, datasets, and preprocessing strategies, shedding light on the geographic distribution of data collection and the profiles of contributing researchers. Additionally, the review synthesizes information on various algorithms and models employed in rice disease detection, highlighting their effectiveness in addressing diverse data complexities. The paper thoroughly evaluates hyperparameter optimization techniques and their impact on model performance, emphasizing the importance of fine-tuning for optimal results. Performance metrics such as accuracy, precision, recall, and F1 score are rigorously analyzed to assess model effectiveness. Furthermore, the discussion section critically examines challenges associated with current methodologies, identifies opportunities for improvement, and outlines future research directions at the intersection of machine learning and rice disease detection. This comprehensive review, analyzing a total of 121 papers, underscores the significance of ongoing interdisciplinary research to meet evolving agricultural technology needs and enhance global food security.

7.
J Cheminform ; 16(1): 74, 2024 Jun 27.
Artigo em Inglês | MEDLINE | ID: mdl-38937840

RESUMO

This paper presents AutoTemplate, an innovative data preprocessing protocol, addressing the crucial need for high-quality chemical reaction datasets in the realm of machine learning applications in organic chemistry. Recent advances in artificial intelligence have expanded the application of machine learning in chemistry, particularly in yield prediction, retrosynthesis, and reaction condition prediction. However, the effectiveness of these models hinges on the integrity of chemical reaction datasets, which are often plagued by inconsistencies like missing reactants, incorrect atom mappings, and outright erroneous reactions. AutoTemplate introduces a two-stage approach to refine these datasets. The first stage involves extracting meaningful reaction transformation rules and formulating generic reaction templates using a simplified SMARTS representation. This simplification broadens the applicability of templates across various chemical reactions. The second stage is template-guided reaction curation, where these templates are systematically applied to validate and correct the reaction data. This process effectively amends missing reactant information, rectifies atom-mapping errors, and eliminates incorrect data entries. A standout feature of AutoTemplate is its capability to concurrently identify and correct false chemical reactions. It operates on the premise that most reactions in datasets are accurate, using these as templates to guide the correction of flawed entries. The protocol demonstrates its efficacy across a range of chemical reactions, significantly enhancing dataset quality. This advancement provides a more robust foundation for developing reliable machine learning models in chemistry, thereby improving the accuracy of forward and retrosynthetic predictions. AutoTemplate marks a significant progression in the preprocessing of chemical reaction datasets, bridging a vital gap and facilitating more precise and efficient machine learning applications in organic synthesis. SCIENTIFIC CONTRIBUTION: The proposed automated preprocessing tool for chemical reaction data aims to identify errors within chemical databases. Specifically, if the errors involve atom mapping or the absence of reactant types, corrections can be systematically applied using reaction templates, ultimately elevating the overall quality of the database.

8.
Sensors (Basel) ; 24(11)2024 May 31.
Artigo em Inglês | MEDLINE | ID: mdl-38894355

RESUMO

This paper presents the results of a study on data preprocessing and modeling for predicting corrosion in water pipelines of a steel industrial plant. The use case is a cooling circuit consisting of both direct and indirect cooling. In the direct cooling circuit, water comes into direct contact with the product, whereas in the indirect one, it does not. In this study, advanced machine learning techniques, such as extreme gradient boosting and deep neural networks, have been employed for two distinct applications. Firstly, a virtual sensor was created to estimate the corrosion rate based on influencing process variables, such as pH and temperature. Secondly, a predictive tool was designed to foresee the future evolution of the corrosion rate, considering past values of both influencing variables and the corrosion rate. The results show that the most suitable algorithm for the virtual sensor approach is the dense neural network, with MAPE values of (25 ± 4)% and (11 ± 4)% for the direct and indirect circuits, respectively. In contrast, different results are obtained for the two circuits when following the predictive tool approach. For the primary circuit, the convolutional neural network yields the best results, with MAPE = 4% on the testing set, whereas for the secondary circuit, the LSTM recurrent network shows the highest prediction accuracy, with MAPE = 9%. In general, models employing temporal windows have emerged as more suitable for corrosion prediction, with model performance significantly improving with a larger dataset.

9.
Ecol Evol ; 14(5): e11292, 2024 May.
Artigo em Inglês | MEDLINE | ID: mdl-38725827

RESUMO

Plant trait data are used to quantify how plants respond to environmental factors and can act as indicators of ecosystem function. Measured trait values are influenced by genetics, trade-offs, competition, environmental conditions, and phenology. These interacting effects on traits are poorly characterized across taxa, and for many traits, measurement protocols are not standardized. As a result, ancillary information about growth and measurement conditions can be highly variable, requiring a flexible data structure. In 2007, the TRY initiative was founded as an integrated database of plant trait data, including ancillary attributes relevant to understanding and interpreting the trait values. The TRY database now integrates around 700 original and collective datasets and has become a central resource of plant trait data. These data are provided in a generic long-table format, where a unique identifier links different trait records and ancillary data measured on the same entity. Due to the high number of trait records, plant taxa, and types of traits and ancillary data released from the TRY database, data preprocessing is necessary but not straightforward. Here, we present the 'rtry' R package, specifically designed to support plant trait data exploration and filtering. By integrating a subset of existing R functions essential for preprocessing, 'rtry' avoids the need for users to navigate the extensive R ecosystem and provides the functions under a consistent syntax. 'rtry' is therefore easy to use even for beginners in R. Notably, 'rtry' does not support data retrieval or analysis; rather, it focuses on the preprocessing tasks to optimize data quality. While 'rtry' primarily targets TRY data, its utility extends to data from other sources, such as the National Ecological Observatory Network (NEON). The 'rtry' package is available on the Comprehensive R Archive Network (CRAN; https://cran.r-project.org/package=rtry) and the GitHub Wiki (https://github.com/MPI-BGC-Functional-Biogeography/rtry/wiki) along with comprehensive documentation and vignettes describing detailed data preprocessing workflows.

10.
J Environ Manage ; 360: 121097, 2024 Jun.
Artigo em Inglês | MEDLINE | ID: mdl-38733844

RESUMO

With high-frequency data of nitrate (NO3-N) concentrations in waters becoming increasingly important for understanding of watershed system behaviors and ecosystem managements, the accurate and economic acquisition of high-frequency NO3-N concentration data has become a key point. This study attempted to use coupled deep learning neural networks and routine monitored data to predict hourly NO3-N concentrations in a river. The hourly NO3-N concentration at the outlet of the Oyster River watershed in New Hampshire, USA, was predicted through neural networks with a hybrid model architecture coupling the Convolutional Neural Networks and the Long Short-Term Memory model (CNN-LSTM). The routine monitored data (the river depth, water temperature, air temperature, precipitation, specific conductivity, pH and dissolved oxygen concentrations) for model training were collected from a nested high-frequency monitoring network, while the high-frequency NO3-N concentration data obtained at the outlet were not included as inputs. The whole dataset was separated into training, validation, and testing processes according to the ratio of 5:3:2, respectively. The hybrid CNN-LSTM model with different input lengths (1d, 3d, 7d, 15d, 30d) displayed comparable even better performance than other studies with lower frequencies, showing mean values of the Nash-Sutcliffe Efficiency 0.60-0.83. Models with shorter input lengths demonstrated both the higher modeling accuracy and stability. The water level, water temperature and pH values at monitoring sites were main controlling factors for forecasting performances. This study provided a new insight of using deep learning networks with a coupled architecture and routine monitored data for high-frequency riverine NO3-N concentration forecasting and suggestions about strategies about variable and input length selection during preprocessing of input data.


Assuntos
Aprendizado Profundo , Redes Neurais de Computação , Nitratos , Rios , Nitratos/análise , Rios/química , Monitoramento Ambiental/métodos , Poluentes Químicos da Água/análise , New Hampshire
11.
BMC Genomics ; 25(1): 361, 2024 Apr 12.
Artigo em Inglês | MEDLINE | ID: mdl-38609853

RESUMO

BACKGROUND: Single-cell sequencing techniques are revolutionizing every field of biology by providing the ability to measure the abundance of biological molecules at a single-cell resolution. Although single-cell sequencing approaches have been developed for several molecular modalities, single-cell transcriptome sequencing is the most prevalent and widely applied technique. SPLiT-seq (split-pool ligation-based transcriptome sequencing) is one of these single-cell transcriptome techniques that applies a unique combinatorial-barcoding approach by splitting and pooling cells into multi-well plates containing barcodes. This unique approach required the development of dedicated computational tools to preprocess the data and extract the count matrices. Here we compare eight bioinformatic pipelines (alevin-fry splitp, LR-splitpipe, SCSit, splitpipe, splitpipeline, SPLiTseq-demultiplex, STARsolo and zUMI) that have been developed to process SPLiT-seq data. We provide an overview of the tools, their computational performance, functionality and impact on downstream processing of the single-cell data, which vary greatly depending on the tool used. RESULTS: We show that STARsolo, splitpipe and alevin-fry splitp can all handle large amount of data within reasonable time. In contrast, the other five pipelines are slow when handling large datasets. When using smaller dataset, cell barcode results are similar with the exception of SPLiTseq-demultiplex and splitpipeline. LR-splitpipe that is originally designed for processing long-read sequencing data is the slowest of all pipelines. Alevin-fry produced different down-stream results that are difficult to interpret. STARsolo functions nearly identical to splitpipe and produce results that are highly similar to each other. However, STARsolo lacks the function to collapse random hexamer reads for which some additional coding is required. CONCLUSION: Our comprehensive comparative analysis aids users in selecting the most suitable analysis tool for efficient SPLiT-seq data processing, while also detailing the specific prerequisites for each of these pipelines. From the available pipelines, we recommend splitpipe or STARSolo for SPLiT-seq data analysis.


Assuntos
Biologia Computacional , Transcriptoma , Análise de Dados
12.
Heliyon ; 10(6): e27752, 2024 Mar 30.
Artigo em Inglês | MEDLINE | ID: mdl-38560675

RESUMO

This study worked with Chunghwa Telecom to collect data from 17 rooftop solar photovoltaic plants installed on top of office buildings, warehouses, and computer rooms in northern, central and southern Taiwan from January 2021 to June 2023. A data pre-processing method combining linear regression and K Nearest Neighbor (k-NN) was proposed to estimate missing values for weather and power generation data. Outliers were processed using historical data and parameters highly correlated with power generation volumes were used to train an artificial intelligence (AI) model. To verify the reliability of this data pre-processing method, this study developed multilayer perceptron (MLP) and long short-term memory (LSTM) models to make short-term and medium-term power generation forecasts for the 17 solar photovoltaic plants. Study results showed that the proposed data pre-processing method reduced normalized root mean square error (nRMSE) for short- and medium-term forecasts in the MLP model by 17.47% and 11.06%, respectively, and also reduced the nRMSE for short- and medium-term forecasts in the LSTM model by 20.20% and 8.03%, respectively.

13.
Sci Rep ; 14(1): 9152, 2024 Apr 21.
Artigo em Inglês | MEDLINE | ID: mdl-38644408

RESUMO

Air pollution stands as a significant modern-day challenge impacting life quality, the environment, and the economy. It comprises various pollutants like gases, particulate matter, biological molecules, and more, stemming from sources such as vehicle emissions, industrial operations, agriculture, and natural events. Nitrogen dioxide (NO2), among these harmful gases, is notably prevalent in densely populated urban regions. Given its adverse effects on health and the environment, accurate monitoring of NO2 levels becomes imperative for devising effective risk mitigation strategies. However, the precise measurement of NO2 poses challenges as it traditionally relies on costly and bulky equipment. This has prompted the development of more affordable alternatives, although their reliability is often questionable. The aim of this article is to introduce a groundbreaking method for precisely calibrating cost-effective NO2 sensors. This technique involves statistical preprocessing of low-cost sensor readings, aligning their distribution with reference data. Central to this calibration is an artificial neural network (ANN) surrogate designed to predict sensor correction coefficients. It utilizes environmental variables (temperature, humidity, atmospheric pressure), cross-references auxiliary NO2 sensors, and incorporates short time series of previous readings from the primary sensor. These methods are complemented by global data scaling. Demonstrated using a custom-designed cost-effective monitoring platform and high-precision public reference station data collected over 5 months, every component of our calibration framework proves crucial, contributing to its exceptional accuracy (with a correlation coefficient near 0.95 concerning the reference data and an RMSE below 2.4 µg/m3). This level of performance positions the calibrated sensor as a viable, cost-effective alternative to traditional monitoring approaches.

14.
Sensors (Basel) ; 24(8)2024 Apr 15.
Artigo em Inglês | MEDLINE | ID: mdl-38676138

RESUMO

Soft sensors have been extensively utilized to approximate real-time power prediction in wind power generation, which is challenging to measure instantaneously. The short-term forecast of wind power aims at providing a reference for the dispatch of the intraday power grid. This study proposes a soft sensor model based on the Long Short-Term Memory (LSTM) network by combining data preprocessing with Variational Modal Decomposition (VMD) to improve wind power prediction accuracy. It does so by adopting the isolation forest algorithm for anomaly detection of the original wind power series and processing the missing data by multiple imputation. Based on the process data samples, VMD technology is used to achieve power data decomposition and noise reduction. The LSTM network is introduced to predict each modal component separately, and further sum reconstructs the prediction results of each component to complete the wind power prediction. From the experimental results, it can be seen that the LSTM network which uses an Adam optimizing algorithm has better convergence accuracy. The VMD method exhibited superior decomposition outcomes due to its inherent Wiener filter capabilities, which effectively mitigate noise and forestall modal aliasing. The Mean Absolute Percentage Error (MAPE) was reduced by 9.3508%, which indicates that the LSTM network combined with the VMD method has better prediction accuracy.

15.
Genes Dis ; 11(3): 100979, 2024 May.
Artigo em Inglês | MEDLINE | ID: mdl-38299197

RESUMO

Metabolomics as a research field and a set of techniques is to study the entire small molecules in biological samples. Metabolomics is emerging as a powerful tool generally for precision medicine. Particularly, integration of microbiome and metabolome has revealed the mechanism and functionality of microbiome in human health and disease. However, metabolomics data are very complicated. Preprocessing/pretreating and normalizing procedures on metabolomics data are usually required before statistical analysis. In this review article, we comprehensively review various methods that are used to preprocess and pretreat metabolomics data, including MS-based data and NMR -based data preprocessing, dealing with zero and/or missing values and detecting outliers, data normalization, data centering and scaling, data transformation. We discuss the advantages and limitations of each method. The choice for a suitable preprocessing method is determined by the biological hypothesis, the characteristics of the data set, and the selected statistical data analysis method. We then provide the perspective of their applications in the microbiome and metabolome research.

16.
Spectrochim Acta A Mol Biomol Spectrosc ; 310: 123866, 2024 Apr 05.
Artigo em Inglês | MEDLINE | ID: mdl-38219612

RESUMO

We have developed a novel 3D asynchronous correlation method (3D-ACM) designed for the classification and identification of Chinese handmade paper samples using Raman spectra and machine learning. The 3D-ACM approach involves two rounds of tensor product and Hilbert transform operations. In the tensor product process, the outer product of the spectral data from different samples within the same category is computed, establishing inner connections among all samples within that category. The Hilbert transform introduces a 90-degree phase shift, resulting in a true three-dimensional spectral data structure. This expansion significantly increases the number of equivalent frequency points and samples within each category. This enhancement substantially boosts spectral resolution and reveals more hidden information within the spectral data. To maximize the potential of 3D-ACM, we employed six machine learning models: principal component analysis (PCA) with linear regression (LR), support vector machine (SVM) with LR, k-Nearest Neighbors (KNN), random forest (RF), and convolutional neural network (CNN). When applied to the 3D-ACM data preprocessing method, R-squared values of PLS-LR, KNN, RF and CNN supervised models, approached or equaled 1. This indicates exceptional performance comparable to unsupervised models like PCA. 3D-ACM stands as a versatile mathematical technique not confined to spectral data. It also eliminates the necessity for additional experimental setups or external control conditions, distinct from traditional two-dimensional correlation spectroscopy. Moreover, it preserves the original experimental data, setting it apart from conventional data preprocessing methods. This positions 3D-ACM as a promising tool for future material classification and identification in conjunction with machine learning.

17.
Water Environ Res ; 96(1): e10960, 2024 Jan.
Artigo em Inglês | MEDLINE | ID: mdl-38168046

RESUMO

As an emerging desalination technology, forward osmosis (FO) can potentially become a reliable method to help remedy the current water crisis. Introducing uncomplicated and precise models could help FO systems' optimization. This paper presents the prediction and evaluation of FO systems' membrane flux using various artificial intelligence-based models. Detailed data gathering and cleaning were emphasized because appropriate modeling requires precise inputs. Accumulating data from the original sources, followed by duplicate removal, outlier detection, and feature selection, paved the way to begin modeling. Six models were executed for the prediction task, among which two are tree-based models, two are deep learning models, and two are miscellaneous models. The calculated coefficient of determination (R2 ) of our best model (XGBoost) was 0.992. In conclusion, tree-based models (XGBoost and CatBoost) show more accurate performance than neural networks. Furthermore, in the sensitivity analysis, feed solution (FS) and draw solution (DS) concentrations showed a strong correlation with membrane flux. PRACTITIONER POINTS: The FO membrane flux was predicted using a variety of machine-learning models. Thorough data preprocessing was executed. The XGBoost model showed the best performance, with an R2 of 0.992. Tree-based models outperformed neural networks and other models.


Assuntos
Inteligência Artificial , Purificação da Água , Purificação da Água/métodos , Membranas Artificiais , Osmose , Água
18.
Sensors (Basel) ; 24(2)2024 Jan 06.
Artigo em Inglês | MEDLINE | ID: mdl-38257441

RESUMO

Hand gesture recognition, which is one of the fields of human-computer interaction (HCI) research, extracts the user's pattern using sensors. Radio detection and ranging (RADAR) sensors are robust under severe environments and convenient to use for hand gestures. The existing studies mostly adopted continuous-wave (CW) radar, which only shows a good performance at a fixed distance, which is due to its limitation of not seeing the distance. This paper proposes a hand gesture recognition system that utilizes frequency-shift keying (FSK) radar, allowing for a recognition method that can work at the various distances between a radar sensor and a user. The proposed system adopts a convolutional neural network (CNN) model for the recognition. From the experimental results, the proposed recognition system covers the range from 30 cm to 180 cm and shows an accuracy of 93.67% over the entire range.

19.
Sensors (Basel) ; 24(2)2024 Jan 08.
Artigo em Inglês | MEDLINE | ID: mdl-38257469

RESUMO

Environment perception plays a crucial role in autonomous driving technology. However, various factors such as adverse weather conditions and limitations in sensing equipment contribute to low perception accuracy and a restricted field of view. As a result, intelligent connected vehicles (ICVs) are currently only capable of achieving autonomous driving in specific scenarios. This paper conducts an analysis of the current studies on image or point cloud processing and cooperative perception, and summarizes three key aspects: data pre-processing methods, multi-sensor data fusion methods, and vehicle-infrastructure cooperative perception methods. Data pre-processing methods summarize the processing of point cloud data and image data in snow, rain and fog. Multi-sensor data fusion methods analyze the studies on image fusion, point cloud fusion and image-point cloud fusion. Because communication channel resources are limited, the vehicle-infrastructure cooperative perception methods discuss the fusion and sharing strategies for cooperative perception information to expand the range of perception for ICVs and achieve an optimal distribution of perception information. Finally, according to the analysis of the existing studies, the paper proposes future research directions for cooperative perception in adverse weather conditions.

20.
Brief Bioinform ; 25(1)2023 11 22.
Artigo em Inglês | MEDLINE | ID: mdl-38113078

RESUMO

Single-cell chromatin accessibility sequencing (scCAS) technologies have enabled characterizing the epigenomic heterogeneity of individual cells. However, the identification of features of scCAS data that are relevant to underlying biological processes remains a significant gap. Here, we introduce a novel method Cofea, to fill this gap. Through comprehensive experiments on 5 simulated and 54 real datasets, Cofea demonstrates its superiority in capturing cellular heterogeneity and facilitating downstream analysis. Applying this method to identification of cell type-specific peaks and candidate enhancers, as well as pathway enrichment analysis and partitioned heritability analysis, we illustrate the potential of Cofea to uncover functional biological process.


Assuntos
Cromatina , Sequências Reguladoras de Ácido Nucleico , Cromatina/genética
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA
...