Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 33
Filtrar
1.
Brief Bioinform ; 24(1)2023 01 19.
Artigo em Inglês | MEDLINE | ID: mdl-36592062

RESUMO

Recent studies have revealed that long noncoding RNAs (lncRNAs) are closely linked to several human diseases, providing new opportunities for their use in detection and therapy. Many graph propagation and similarity fusion approaches can be used for predicting potential lncRNA-disease associations. However, existing similarity fusion approaches suffer from noise and self-similarity loss in the fusion process. To address these problems, a new prediction approach, termed SSMF-BLNP, based on organically combining selective similarity matrix fusion (SSMF) and bidirectional linear neighborhood label propagation (BLNP), is proposed in this paper to predict lncRNA-disease associations. In SSMF, self-similarity networks of lncRNAs and diseases are obtained by selective preprocessing and nonlinear iterative fusion. The fusion process assigns weights to each initial similarity network and introduces a unit matrix that can reduce noise and compensate for the loss of self-similarity. In BLNP, the initial lncRNA-disease associations are employed in both lncRNA and disease directions as label information for linear neighborhood label propagation. The propagation was then performed on the self-similarity network obtained from SSMF to derive the scoring matrix for predicting the relationships between lncRNAs and diseases. Experimental results showed that SSMF-BLNP performed better than seven other state of-the-art approaches. Furthermore, a case study demonstrated up to 100% and 80% accuracy in 10 lncRNAs associated with hepatocellular carcinoma and 10 lncRNAs associated with renal cell carcinoma, respectively. The source code and datasets used in this paper are available at: https://github.com/RuiBingo/SSMF-BLNP.


Assuntos
RNA Longo não Codificante , Humanos , Algoritmos , Biologia Computacional/métodos , RNA Longo não Codificante/genética , Software , Carcinoma Hepatocelular/genética , Carcinoma de Células Renais/genética , Neoplasias Hepáticas/genética , Neoplasias Renais/genética
2.
Brief Bioinform ; 24(1)2023 01 19.
Artigo em Inglês | MEDLINE | ID: mdl-36585781

RESUMO

Genetic similarity matrices are commonly used to assess population substructure (PS) in genetic studies. Through simulation studies and by the application to whole-genome sequencing (WGS) data, we evaluate the performance of three genetic similarity matrices: the unweighted and weighted Jaccard similarity matrices and the genetic relationship matrix. We describe different scenarios that can create numerical pitfalls and lead to incorrect conclusions in some instances. We consider scenarios in which PS is assessed based on loci that are located across the genome ('globally') and based on loci from a specific genomic region ('locally'). We also compare scenarios in which PS is evaluated based on loci from different minor allele frequency bins: common (>5%), low-frequency (5-0.5%) and rare (<0.5%) single-nucleotide variations (SNVs). Overall, we observe that all approaches provide the best clustering performance when computed based on rare SNVs. The performance of the similarity matrices is very similar for common and low-frequency variants, but for rare variants, the unweighted Jaccard matrix provides preferable clustering features. Based on visual inspection and in terms of standard clustering metrics, its clusters are the densest and the best separated in the principal component analysis of variants with rare SNVs compared with the other methods and different allele frequency cutoffs. In an application, we assessed the role of rare variants on local and global PS, using WGS data from multiethnic Alzheimer's disease data sets and European or East Asian populations from the 1000 Genome Project.


Assuntos
Genoma , Genômica , Análise de Componente Principal , Frequência do Gene , Simulação por Computador , Estudo de Associação Genômica Ampla , Polimorfismo de Nucleotídeo Único
3.
BMC Genomics ; 25(1): 875, 2024 Sep 18.
Artigo em Inglês | MEDLINE | ID: mdl-39294558

RESUMO

BACKGROUND: The widely adopted bulk RNA-seq measures the gene expression average of cells, masking cell type heterogeneity, which confounds downstream analyses. Therefore, identifying the cellular composition and cell type-specific gene expression profiles (GEPs) facilitates the study of the underlying mechanisms of various biological processes. Although single-cell RNA-seq focuses on cell type heterogeneity in gene expression, it requires specialized and expensive resources and currently is not practical for a large number of samples or a routine clinical setting. Recently, computational deconvolution methodologies have been developed, while many of them only estimate cell type composition or cell type-specific GEPs by requiring the other as input. The development of more accurate deconvolution methods to infer cell type abundance and cell type-specific GEPs is still essential. RESULTS: We propose a new deconvolution algorithm, DSSC, which infers cell type-specific gene expression and cell type proportions of heterogeneous samples simultaneously by leveraging gene-gene and sample-sample similarities in bulk expression and single-cell RNA-seq data. Through comparisons with the other existing methods, we demonstrate that DSSC is effective in inferring both cell type proportions and cell type-specific GEPs across simulated pseudo-bulk data (including intra-dataset and inter-dataset simulations) and experimental bulk data (including mixture data and real experimental data). DSSC shows robustness to the change of marker gene number and sample size and also has cost and time efficiencies. CONCLUSIONS: DSSC provides a practical and promising alternative to the experimental techniques to characterize cellular composition and heterogeneity in the gene expression of heterogeneous samples.


Assuntos
Algoritmos , Perfilação da Expressão Gênica , RNA-Seq , Análise de Célula Única , Análise de Célula Única/métodos , Humanos , RNA-Seq/métodos , Perfilação da Expressão Gênica/métodos , Análise de Sequência de RNA/métodos , Biologia Computacional/métodos , Transcriptoma , Análise da Expressão Gênica de Célula Única
4.
BMC Med Res Methodol ; 24(1): 212, 2024 Sep 19.
Artigo em Inglês | MEDLINE | ID: mdl-39300394

RESUMO

BACKGROUND: In longitudinal health services research, hospital identification using an ID code, often supplemented with several additional variables, lacks clarity regarding representativeness and variable influence. This study presents an operational method for hospital identity delimitation and a novel longitudinal identification approach, demonstrated using a case study. METHODS: The conceptualisation considers hospitals as evolving entities, identifying "similar enough" pairs across two time points using an automated similarity matrix. This method comprises key variable selection, similarity scoring, and tolerance threshold definition, tailored to data source characteristics and clinical relevance. This linking method is tested by applying the identification of minimum caseload requirements-related German hospitals, utilizing German Hospital Quality Reports (GHQR) 2016-2020. RESULTS: The method achieved a success rate (min: 97.9% - max: 100%, mean: 99.9%) surpassing traditional hospital ID-code linkage (min: 91.5% - max: 98.8%, mean: 96.6%), with a remarkable 99% reduction in manual work through automation. CONCLUSIONS: This method, rooted in a comprehensive understanding of hospital identities, offers an operational, automated, and customisable process serving diverse clinical topics. This approach has the advantage of simultaneously considering multiple variables and systematically observing temporal changes in hospitals. It also enhances the precision and efficiency of longitudinal hospital identification in health services research.


Assuntos
Hospitais , Humanos , Alemanha , Hospitais/estatística & dados numéricos , Hospitais/normas , Estudos Longitudinais , Pesquisa sobre Serviços de Saúde/estatística & dados numéricos , Qualidade da Assistência à Saúde/estatística & dados numéricos , Qualidade da Assistência à Saúde/normas
5.
Sensors (Basel) ; 23(2)2023 Jan 09.
Artigo em Inglês | MEDLINE | ID: mdl-36679566

RESUMO

To extract the phase information from multiple receivers, the conventional sound source localization system involves substantial complexity in software and hardware. Along with the algorithm complexity, the dedicated communication channel and individual analog-to-digital conversions prevent an increase in the system's capability due to feasibility. The previous study suggested and verified the single-channel sound source localization system, which aggregates the receivers on the single analog network for the single digital converter. This paper proposes the improved algorithm for the single-channel sound source localization system based on the Gaussian process regression with the novel feature extraction method. The proposed system consists of three computational stages: homomorphic deconvolution, feature extraction, and Gaussian process regression in cascade. The individual stages represent time delay extraction, data arrangement, and machine prediction, respectively. The optimal receiver configuration for the three-receiver structure is derived from the novel similarity matrix analysis based on the time delay pattern diversity. The simulations and experiments present precise predictions with proper model order and ensemble average length. The nonparametric method, with the rational quadratic kernel, shows consistent performance on trained angles. The Steiglitz-McBride model with the exponential kernel delivers the best predictions for trained and untrained angles with low bias and low variance in statistics.


Assuntos
Software , Localização de Som , Algoritmos
6.
BMC Bioinformatics ; 23(1): 553, 2022 Dec 19.
Artigo em Inglês | MEDLINE | ID: mdl-36536289

RESUMO

BACKGROUND: As a highly aggressive disease, cancer has been becoming the leading death cause around the world. Accurate prediction of the survival expectancy for cancer patients is significant, which can help clinicians make appropriate therapeutic schemes. With the high-throughput sequencing technology becoming more and more cost-effective, integrating multi-type genome-wide data has been a promising method in cancer survival prediction. Based on these genomic data, some data-integration methods for cancer survival prediction have been proposed. However, existing methods fail to simultaneously utilize feature information and structure information of multi-type genome-wide data. RESULTS: We propose a Multi-type Data Joint Learning (MDJL) approach based on multi-type genome-wide data, which comprehensively exploits feature information and structure information. Specifically, MDJL exploits correlation representations between any two data types by cross-correlation calculation for learning discriminant features. Moreover, based on the learned multiple correlation representations, MDJL constructs sample similarity matrices for capturing global and local structures across different data types. With the learned discriminant representation matrix and fused similarity matrix, MDJL constructs graph convolutional network with Cox loss for survival prediction. CONCLUSIONS: Experimental results demonstrate that our approach substantially outperforms established integrative methods and is effective for cancer survival prediction.


Assuntos
Neoplasias , Humanos , Neoplasias/genética , Genômica/métodos , Genoma , Sequenciamento de Nucleotídeos em Larga Escala
7.
Genet Epidemiol ; 45(1): 82-98, 2021 02.
Artigo em Inglês | MEDLINE | ID: mdl-32929743

RESUMO

locStra is an R -package for the analysis of regional and global population stratification in whole-genome sequencing (WGS) studies, where regional stratification refers to the substructure defined by the loci in a particular region on the genome. Population substructure can be assessed based on the genetic covariance matrix, the genomic relationship matrix, and the unweighted/weighted genetic Jaccard similarity matrix. Using a sliding window approach, the regional similarity matrices are compared with the global ones, based on user-defined window sizes and metrics, for example, the correlation between regional and global eigenvectors. An algorithm for the specification of the window size is provided. As the implementation fully exploits sparse matrix algebra and is written in C++, the analysis is highly efficient. Even on single cores, for realistic study sizes (several thousand subjects, several million rare variants per subject), the runtime for the genome-wide computation of all regional similarity matrices does typically not exceed one hour, enabling an unprecedented investigation of regional stratification across the entire genome. The package is applied to three WGS studies, illustrating the varying patterns of regional substructure across the genome and its beneficial effects on association testing.


Assuntos
Estudo de Associação Genômica Ampla , Genoma , Algoritmos , Genômica , Humanos , Polimorfismo de Nucleotídeo Único , Sequenciamento Completo do Genoma
8.
BMC Bioinformatics ; 22(1): 307, 2021 Jun 08.
Artigo em Inglês | MEDLINE | ID: mdl-34103016

RESUMO

BACKGROUND: Circular RNAs (circRNAs) are a class of single-stranded RNA molecules with a closed-loop structure. A growing body of research has shown that circRNAs are closely related to the development of diseases. Because biological experiments to verify circRNA-disease associations are time-consuming and wasteful of resources, it is necessary to propose a reliable computational method to predict the potential candidate circRNA-disease associations for biological experiments to make them more efficient. RESULTS: In this paper, we propose a double matrix completion method (DMCCDA) for predicting potential circRNA-disease associations. First, we constructed a similarity matrix of circRNA and disease according to circRNA sequence information and semantic disease information. We also built a Gauss interaction profile similarity matrix for circRNA and disease based on experimentally verified circRNA-disease associations. Then, the corresponding circRNA sequence similarity and semantic similarity of disease are used to update the association matrix from the perspective of circRNA and disease, respectively, by matrix multiplication. Finally, from the perspective of circRNA and disease, matrix completion is used to update the matrix block, which is formed by splicing the association matrix obtained in the previous step with the corresponding Gaussian similarity matrix. Compared with other approaches, the model of DMCCDA has a relatively good result in leave-one-out cross-validation and five-fold cross-validation. Additionally, the results of the case studies illustrate the effectiveness of the DMCCDA model. CONCLUSION: The results show that our method works well for recommending the potential circRNAs for a disease for biological experiments.


Assuntos
RNA Circular , RNA , Distribuição Normal , RNA/genética
9.
Entropy (Basel) ; 23(5)2021 Apr 29.
Artigo em Inglês | MEDLINE | ID: mdl-33947081

RESUMO

Clustering algorithms for multi-database mining (MDM) rely on computing (n2-n)/2 pairwise similarities between n multiple databases to generate and evaluate m∈[1,(n2-n)/2] candidate clusterings in order to select the ideal partitioning that optimizes a predefined goodness measure. However, when these pairwise similarities are distributed around the mean value, the clustering algorithm becomes indecisive when choosing what database pairs are considered eligible to be grouped together. Consequently, a trivial result is produced by putting all the n databases in one cluster or by returning n singleton clusters. To tackle the latter problem, we propose a learning algorithm to reduce the fuzziness of the similarity matrix by minimizing a weighted binary entropy loss function via gradient descent and back-propagation. As a result, the learned model will improve the certainty of the clustering algorithm by correctly identifying the optimal database clusters. Additionally, in contrast to gradient-based clustering algorithms, which are sensitive to the choice of the learning rate and require more iterations to converge, we propose a learning-rate-free algorithm to assess the candidate clusterings generated on the fly in fewer upper-bounded iterations. To achieve our goal, we use coordinate descent (CD) and back-propagation to search for the optimal clustering of the n multiple database in a way that minimizes a convex clustering quality measure L(θ) in less than (n2-n)/2 iterations. By using a max-heap data structure within our CD algorithm, we optimally choose the largest weight variable θp,q(i) at each iteration i such that taking the partial derivative of L(θ) with respect to θp,q(i) allows us to attain the next steepest descent minimizing L(θ) without using a learning rate. Through a series of experiments on multiple database samples, we show that our algorithm outperforms the existing clustering algorithms for MDM.

10.
J Comput Chem ; 35(18): 1395-409, 2014 Jul 05.
Artigo em Inglês | MEDLINE | ID: mdl-24889018

RESUMO

The present report introduces the QuBiLS-MIDAS software belonging to the ToMoCoMD-CARDD suite for the calculation of three-dimensional molecular descriptors (MDs) based on the two-linear (bilinear), three-linear, and four-linear (multilinear or N-linear) algebraic forms. Thus, it is unique software that computes these tensor-based indices. These descriptors, establish relations for two, three, and four atoms by using several (dis-)similarity metrics or multimetrics, matrix transformations, cutoffs, local calculations and aggregation operators. The theoretical background of these N-linear indices is also presented. The QuBiLS-MIDAS software was developed in the Java programming language and employs the Chemical Development Kit library for the manipulation of the chemical structures and the calculation of the atomic properties. This software is composed by a desktop user-friendly interface and an Abstract Programming Interface library. The former was created to simplify the configuration of the different options of the MDs, whereas the library was designed to allow its easy integration to other software for chemoinformatics applications. This program provides functionalities for data cleaning tasks and for batch processing of the molecular indices. In addition, it offers parallel calculation of the MDs through the use of all available processors in current computers. The studies of complexity of the main algorithms demonstrate that these were efficiently implemented with respect to their trivial implementation. Lastly, the performance tests reveal that this software has a suitable behavior when the amount of processors is increased. Therefore, the QuBiLS-MIDAS software constitutes a useful application for the computation of the molecular indices based on N-linear algebraic maps and it can be used freely to perform chemoinformatics studies.


Assuntos
Algoritmos , Biologia Computacional/métodos , Software
11.
PeerJ ; 11: e15899, 2023.
Artigo em Inglês | MEDLINE | ID: mdl-37719113

RESUMO

Numerous studies have focused on the classification of N6-methyladenosine (m6A) modification sites in RNA sequences, treating it as a multi-feature extraction task. In these studies, the incorporation of physicochemical properties of nucleotides has been applied to enhance recognition efficacy. However, the introduction of excessive supplementary information may introduce noise to the RNA sequence features, and the utilization of sequence similarity information remains underexplored. In this research, we present a novel method for RNA m6A modification site recognition called M6ATMR. Our approach relies solely on sequence information, leveraging Transformer to guide the reconstruction of the sequence similarity matrix, thereby enhancing feature representation. Initially, M6ATMR encodes RNA sequences using 3-mers to generate the sequence similarity matrix. Meanwhile, Transformer is applied to extract sequence structure graphs for each RNA sequence. Subsequently, to capture low-dimensional representations of similarity matrices and structure graphs, we introduce a graph self-correlation convolution block. These representations are then fused and reconstructed through the local-global fusion block. Notably, we adopt iteratively updated sequence structure graphs to continuously optimize the similarity matrix, thereby constraining the end-to-end feature extraction process. Finally, we employ the random forest (RF) algorithm for identifying m6A modification sites based on the reconstructed features. Experimental results demonstrate that M6ATMR achieves promising performance by solely utilizing RNA sequences for m6A modification site identification. Our proposed method can be considered an effective complement to existing RNA m6A modification site recognition approaches.


Assuntos
Adenosina , Nucleotídeos , Sequência de Bases , RNA/genética
12.
Comput Biol Med ; 163: 107179, 2023 09.
Artigo em Inglês | MEDLINE | ID: mdl-37354820

RESUMO

In an imbalanced dataset, a machine learning classifier using traditional imbalance handling methods may achieve good accuracy, but in highly imbalanced datasets, it may over-predict the majority class and ignore the minority class. In the medical domain, failing to correctly estimate the minority class might lead to a false negative, which is concerning in cases of life-threatening illnesses and infectious diseases like Covid-19. Currently, classification in deep learning has a single layered architecture where a neural network is employed. This paper proposes a multilayer design entitled LayNet to address this issue. LayNet aims to lessen the class imbalance by dividing the classes among layers and achieving a balanced class distribution at each layer. To ensure that all the classes are being classified, minor classes are combined to form a single new 'hybrid' class at higher layers. The final layer has no hybrid class and only singleton(distinct) classes. Each layer of the architecture includes a separate model that determines if an input belongs to one class or a hybrid class. If it fits into the hybrid class, it advances to the following layer, which is further categorized within the hybrid class. The method to divide the classes into various architectural levels is also introduced in this paper. The Ocular Disease Intelligent Recognition Dataset, Covid-19 Radiography Dataset, and Retinal OCT Dataset are used to evaluate this methodology. The LayNet architecture performs better on these datasets when the results of the traditional single-layer architecture and the proposed multilayered architecture are compared.


Assuntos
COVID-19 , Humanos , COVID-19/diagnóstico por imagem , Redes Neurais de Computação , Aprendizado de Máquina , Radiografia
13.
Front Pharmacol ; 14: 1132012, 2023.
Artigo em Inglês | MEDLINE | ID: mdl-36817132

RESUMO

Increasing evidences suggest that miRNAs play a key role in the occurrence and progression of many complex human diseases. Therefore, targeting dysregulated miRNAs with small molecule drugs in the clinical has become a new treatment. Nevertheless, it is high cost and time-consuming for identifying miRNAs-targeted with drugs by biological experiments. Thus, more reliable computational method for identification associations of drugs with miRNAs urgently need to be developed. In this study, we proposed an efficient method, called GNMFDMA, to predict potential associations of drug with miRNA by combining graph Laplacian regularization with non-negative matrix factorization. We first calculated the overall similarity matrices of drugs and miRNAs according to the collected different biological information. Subsequently, the new drug-miRNA association adjacency matrix was reformulated based on the K nearest neighbor profiles so as to put right the false negative associations. Finally, graph Laplacian regularization collaborative non-negative matrix factorization was used to calculate the association scores of drugs with miRNAs. In the cross validation, GNMFDMA obtains AUC of 0.9193, which outperformed the existing methods. In addition, case studies on three common drugs (i.e., 5-Aza-CdR, 5-FU and Gemcitabine), 30, 31 and 34 of the top-50 associations inferred by GNMFDMA were verified. These results reveal that GNMFDMA is a reliable and efficient computational approach for identifying the potential drug-miRNA associations.

14.
Front Neurosci ; 17: 1154252, 2023.
Artigo em Inglês | MEDLINE | ID: mdl-37284658

RESUMO

Although there is a plethora of modeling literature dedicated to the object recognition processes of the ventral ("what") pathway of primate visual systems, modeling studies on the motion-sensitive regions like the Medial superior temporal area (MST) of the dorsal ("where") pathway are relatively scarce. Neurons in the MST area of the macaque monkey respond selectively to different types of optic flow sequences such as radial and rotational flows. We present three models that are designed to simulate the computation of optic flow performed by the MST neurons. Model-1 and model-2 each composed of three stages: Direction Selective Mosaic Network (DSMN), Cell Plane Network (CPNW) or the Hebbian Network (HBNW), and the Optic flow network (OF). The three stages roughly correspond to V1-MT-MST areas, respectively, in the primate motion pathway. Both these models are trained stage by stage using a biologically plausible variation of Hebbian rule. The simulation results show that, neurons in model-1 and model-2 (that are trained on translational, radial, and rotational sequences) develop responses that could account for MSTd cell properties found neurobiologically. On the other hand, model-3 consists of the Velocity Selective Mosaic Network (VSMN) followed by a convolutional neural network (CNN) which is trained on radial and rotational sequences using a supervised backpropagation algorithm. The quantitative comparison of response similarity matrices (RSMs), made out of convolution layer and last hidden layer responses, show that model-3 neuron responses are consistent with the idea of functional hierarchy in the macaque motion pathway. These results also suggest that the deep learning models could offer a computationally elegant and biologically plausible solution to simulate the development of cortical responses of the primate motion pathway.

15.
J Anim Sci ; 100(9)2022 Sep 01.
Artigo em Inglês | MEDLINE | ID: mdl-35775583

RESUMO

The microbial composition resemblance among individuals in a group can be summarized in a square covariance matrix and fitted in linear models. We investigated eight approaches to create the matrix that quantified the resemblance between animals based on the gut microbiota composition. We aimed to compare the performance of different methods in estimating trait microbiability and predicting growth and body composition traits in three pig breeds. This study included 651 purebred boars from either breed: Duroc (n = 205), Landrace (n = 226), and Large White (n = 220). Growth and body composition traits, including body weight (BW), ultrasound backfat thickness (BF), ultrasound loin depth (LD), and ultrasound intramuscular fat (IMF) content, were measured on live animals at the market weight (156 ± 2.5 d of age). Rectal swabs were taken from each animal at 158 ± 4 d of age and subjected to 16S rRNA gene sequencing. Eight methods were used to create the microbial similarity matrices, including 4 kernel functions (Linear Kernel, LK; Polynomial Kernel, PK; Gaussian Kernel, GK; Arc-cosine Kernel with one hidden layer, AK1), 2 dissimilarity methods (Bray-Curtis, BC; Jaccard, JA), and 2 ordination methods (Metric Multidimensional Scaling, MDS; Detrended Correspondence analysis, DCA). Based on the matrix used, microbiability estimates ranged from 0.07 to 0.21 and 0.12 to 0.53 for Duroc, 0.03 to 0.21 and 0.05 to 0.44 for Landrace, and 0.02 to 0.24 and 0.05 to 0.52 for Large White pigs averaged over traits in the model with sire, pen, and microbiome, and model with the only microbiome, respectively. The GK, JA, BC, and AK1 obtained greater microbiability estimates than the remaining methods across traits and breeds. Predictions were made within each breed group using four-fold cross-validation based on the relatedness of sires in each breed group. The prediction accuracy ranged from 0.03 to 0.18 for BW, 0.08 to 0.31 for BF, 0.21 to 0.48 for LD, and 0.04 to 0.16 for IMF when averaged across breeds. The BC, MDS, LK, and JA achieved better accuracy than other methods in most predictions. Overall, the PK and DCA exhibited the worst performance compared to other microbiability estimation and prediction methods. The current study shows how alternative approaches summarized the resemblance of gut microbiota composition among animals and contributed this information to variance component estimation and phenotypic prediction in swine.


Gut microbiota has received significant research attention in farm animals because of its close relationship with host performance. We chose eight approaches to create a square covariance matrix that characterizes the relationship among animals based on their gut microbiota composition. Then, we fitted this information with linear models to evaluate the proportion of phenotypic variance explained by gut microbiota composition and predict host growth and body composition traits in three pig breeds. We found that different matrices had varying performance in predicting host phenotypes, but the results highly depended on the trait and breed considered in the prediction. Our findings highlight possible alternative approaches to incorporate gut microbiome data in regression models and emphasize the value of gut microbiome data in better understanding complex traits in pigs with diverse genetic backgrounds.


Assuntos
Microbioma Gastrointestinal , Animais , Composição Corporal/genética , Masculino , Fenótipo , RNA Ribossômico 16S/genética , Suínos
16.
Biosensors (Basel) ; 12(12)2022 Dec 19.
Artigo em Inglês | MEDLINE | ID: mdl-36551149

RESUMO

Biosignal-based technology has been increasingly available in our daily life, being a critical information source. Wearable biosensors have been widely applied in, among others, biometrics, sports, health care, rehabilitation assistance, and edutainment. Continuous data collection from biodevices provides a valuable volume of information, which needs to be curated and prepared before serving machine learning applications. One of the universal preparation steps is data segmentation and labelling/annotation. This work proposes a practical and manageable way to automatically segment and label single-channel or multimodal biosignal data using a self-similarity matrix (SSM) computed with signals' feature-based representation. Applied to public biosignal datasets and a benchmark for change point detection, the proposed approach delivered lucid visual support in interpreting the biosignals with the SSM while performing accurate automatic segmentation of biosignals with the help of the novelty function and associating the segments grounded on their similarity measures with the similarity profiles. The proposed method performed superior to other algorithms in most cases of a series of automatic biosignal segmentation tasks; of equal appeal is that it provides an intuitive visualization for information retrieval of multimodal biosignals.


Assuntos
Algoritmos , Medicina , Aprendizado de Máquina , Armazenamento e Recuperação da Informação
17.
Proc Inst Mech Eng H ; 236(10): 1492-1501, 2022 Oct.
Artigo em Inglês | MEDLINE | ID: mdl-35978493

RESUMO

Interstitial lung disease (ILD), representing a collection of disorders, is considered to be the deadliest one, which increases the mortality rate of humans. In this paper, an automated scheme for detection and classification of ILD patterns is presented, which eliminates low inter-class feature variation and high intra-class feature variation in patterns, caused by translation and illumination effects. A novel and efficient feature extraction method named Template-Matching Combined Sparse Coding (TMCSC) is proposed, which extracts features invariant to translation and illumination effects, from defined regions of interest (ROI) within lung parenchyma. The translated image patch is compared with all possible templates of the image using template matching process. The corresponding sparse matrix for the set of translated image patches and their nearest template is obtained by minimizing the objective function of the similarity matrix of translated image patch and the template. A novel Blended-Multi Class Support Vector Machine (B-MCSVM) is designed for tackling high-intra class feature variation problems, which provides improved classification accuracy. Region of interests (ROIs) of five lung tissue patterns (healthy, emphysema, ground glass, micronodule, and fibrosis) selected from an internal multimedia database that contains high-resolution computed tomography (HRCT) image series are identified and utilized in this work. Performance of the proposed scheme outperforms most of the state-of-art multi-class classification algorithms.


Assuntos
Doenças Pulmonares Intersticiais , Máquina de Vetores de Suporte , Algoritmos , Humanos , Pulmão/diagnóstico por imagem , Doenças Pulmonares Intersticiais/diagnóstico por imagem , Tomografia Computadorizada por Raios X
18.
Environ Sci Pollut Res Int ; 28(30): 40746-40755, 2021 Aug.
Artigo em Inglês | MEDLINE | ID: mdl-32632685

RESUMO

Air pollution these days could cause severe effects on human health. As human health is crumbled with serious respiratory or other lung diseases, it is prominent to study air pollution. One of the ways to address this issue is by applying clustering techniques. The two main important problems that are faced in the clustering algorithm are, firstly, the exact shape of the cluster and the number of clusters that input data can produce. Secondly, choosing an appropriate algorithm for a particular problem is not clearly known. Finally, multiple replications of the same algorithm lead to alternative solutions due to the fact such as random initialization of cluster heads. Ensembling algorithms can handle these problems and overcome bias and variance in the traditional clustering process. An adequate study has not been carried out in the ensembling approach mainly for clustering. In this paper, we use an enhanced ensemble clustering method to cluster the pollution data levels. This study helps to take preventive measures that are needed to control further contamination, reduce the alarming levels, and analyze the results to find healthy and unhealthy regions in a given area. This ensemble technique also explains about uncertain objects that are found in clustering. The distinct advantage of this algorithm is that there is no requirement of prior information about the data. This experiment shows that the implemented ensemble consensus clustering has demonstrated improved performance when compared with basic clustering algorithms.


Assuntos
Poluição do Ar , Algoritmos , Análise por Conglomerados , Poluição Ambiental , Humanos
19.
Insects ; 12(3)2021 Mar 10.
Artigo em Inglês | MEDLINE | ID: mdl-33801793

RESUMO

Freshwater biodiversity is facing a severe crisis due to many human impacts, yet the diversity dynamics of freshwater communities and possibilities of assessing these are vastly unexplored. We aimed at emphasizing different aspects of portraying diversity of a species-rich, aquatic insect group (caddisflies; Trichoptera) across four different habitats in an anthropogenically unimpacted, connected karst barrage lake/riverine system. To define diversity, we used common indices with pre-set sensitivity to species abundance/dominance; i.e., sensitivity parameter (species richness, Shannon, Simpson, Berger-Parker) and diversity profiles based on continuous gradients of this sensitivity parameter: the naïve and non-naïve diversity profiles developed by Leinster and Cobbold. The non-naïve diversity profiles show diversity profiles with regard to the similarity among species in terms of ecological traits and preferences, whereas the naïve diversity profile is called mathematically "naïve" as it assumes absolute dissimilarity between species that is almost never true. The commonly used indices and the naïve diversity profile both ranked the springs as least diverse and tufa barriers as most diverse. The non-naïve diversity profiles based on similarity matrices (using feeding behavior and stream zonation preferences of species), showed even greater differences between these habitats, while ranking stream habitats close together, regardless of their longitudinal position. We constructed the Climate Score index (CSI) in order to assess how diversity and species' vulnerability project the community's resistance and/or resilience to climate change. The CSI ranked the springs as most vulnerable, followed by all habitats longitudinally placed below them. We highlight the importance of integrating ecological information into biodiversity and vulnerability assessment of freshwater communities.

20.
Inform Med Unlocked ; 24: 100621, 2021.
Artigo em Inglês | MEDLINE | ID: mdl-34075341

RESUMO

Novel Coronavirus with its highly transmittable characteristics is rapidly spreading, endangering millions of human lives and the global economy. To expel the chain of alteration and subversive expansion, early and effective diagnosis of infected patients is immensely important. Unfortunately, there is a lack of testing equipment in many countries as compared with the number of infected patients. It would be desirable to have a swift diagnosis with identification of COVID-19 from disease genes or from CT or X-Ray images. COVID-19 causes flus, cough, pneumonia, and lung infection in patients, wherein massive alveolar damage and progressive respiratory failure can lead to death. This paper proposes two different detection methods - the first is a Gene-based screening method to detect Corona diseases (Middle East respiratory syndrome-related coronavirus, Severe acute respiratory syndrome coronavirus 2, and Human coronavirus HKU1) and differentiate it from Pneumonia. This novel approach to healthcare utilizes disease genes to build functional semantic similarity among genes. Different machine learning algorithms - eXtreme Gradient Boosting, Naïve Bayes, Regularized Random Forest, Random Forest Rule-Based Model, Random Ferns, C5.0 and Multi-Layer Perceptron, are trained and tested on the semantic similarities to classify Corona and Pneumonia diseases. The best performing models are then ensembled, yielding an accuracy of nearly 93%. The second diagnosis technique proposed herein is an automated COVID-19 diagnostic method which uses chest X-ray images to classify Normal versus COVID-19 and Pneumonia versus COVID-19 images using the deep-CNN technique, achieving 99.87% and 99.48% test accuracy. Thus, this research can be an assistance for providing better treatment against COVID-19.

SELEÇÃO DE REFERÊNCIAS
Detalhe da pesquisa