Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 50
Filtrar
Mais filtros










Base de dados
Intervalo de ano de publicação
1.
PLoS One ; 19(4): e0301195, 2024.
Artigo em Inglês | MEDLINE | ID: mdl-38574109

RESUMO

Understanding the evolution of Severe Acute Respiratory Syndrome Coronavirus (SARS-CoV-2) and its relationship to other coronaviruses in the wild is crucial for preventing future virus outbreaks. While the origin of the SARS-CoV-2 pandemic remains uncertain, mounting evidence suggests the direct involvement of the bat and pangolin coronaviruses in the evolution of the SARS-CoV-2 genome. To unravel the early days of a probable zoonotic spillover event, we analyzed genomic data from various coronavirus strains from both human and wild hosts. Bayesian phylogenetic analysis was performed using multiple datasets, using strict and relaxed clock evolutionary models to estimate the occurrence times of key speciation, gene transfer, and recombination events affecting the evolution of SARS-CoV-2 and its closest relatives. We found strong evidence supporting the presence of temporal structure in datasets containing SARS-CoV-2 variants, enabling us to estimate the time of SARS-CoV-2 zoonotic spillover between August and early October 2019. In contrast, datasets without SARS-CoV-2 variants provided mixed results in terms of temporal structure. However, they allowed us to establish that the presence of a statistically robust clade in the phylogenies of gene S and its receptor-binding (RBD) domain, including two bat (BANAL) and two Guangdong pangolin coronaviruses (CoVs), is due to the horizontal gene transfer of this gene from the bat CoV to the pangolin CoV that occurred in the middle of 2018. Importantly, this clade is closely located to SARS-CoV-2 in both phylogenies. This phylogenetic proximity had been explained by an RBD gene transfer from the Guangdong pangolin CoV to a very recent ancestor of SARS-CoV-2 in some earlier works in the field before the BANAL coronaviruses were discovered. Overall, our study provides valuable insights into the timeline and evolutionary dynamics of the SARS-CoV-2 pandemic.


Assuntos
COVID-19 , Quirópteros , Animais , Humanos , SARS-CoV-2/genética , Filogenia , Pangolins/genética , COVID-19/epidemiologia , Teorema de Bayes , Zoonoses/epidemiologia
2.
Brief Bioinform ; 24(3)2023 05 19.
Artigo em Inglês | MEDLINE | ID: mdl-37080758

RESUMO

CRISPR/Cas9 (Clustered Regularly Interspaced Short Palindromic Repeats and CRISPR-associated protein 9) is a popular and effective two-component technology used for targeted genetic manipulation. It is currently the most versatile and accurate method of gene and genome editing, which benefits from a large variety of practical applications. For example, in biomedicine, it has been used in research related to cancer, virus infections, pathogen detection, and genetic diseases. Current CRISPR/Cas9 research is based on data-driven models for on- and off-target prediction as a cleavage may occur at non-target sequence locations. Nowadays, conventional machine learning and deep learning methods are applied on a regular basis to accurately predict on-target knockout efficacy and off-target profile of given single-guide RNAs (sgRNAs). In this paper, we present an overview and a comparative analysis of traditional machine learning and deep learning models used in CRISPR/Cas9. We highlight the key research challenges and directions associated with target activity prediction. We discuss recent advances in the sgRNA-DNA sequence encoding used in state-of-the-art on- and off-target prediction models. Furthermore, we present the most popular deep learning neural network architectures used in CRISPR/Cas9 prediction models. Finally, we summarize the existing challenges and discuss possible future investigations in the field of on- and off-target prediction. Our paper provides valuable support for academic and industrial researchers interested in the application of machine learning methods in the field of CRISPR/Cas9 genome editing.


Assuntos
Sistemas CRISPR-Cas , Aprendizado Profundo , Edição de Genes/métodos , Aprendizado de Máquina
3.
Inf Fusion ; 90: 364-381, 2023 Feb.
Artigo em Inglês | MEDLINE | ID: mdl-36217534

RESUMO

The COVID-19 (Coronavirus disease 2019) pandemic has become a major global threat to human health and well-being. Thus, the development of computer-aided detection (CAD) systems that are capable of accurately distinguishing COVID-19 from other diseases using chest computed tomography (CT) and X-ray data is of immediate priority. Such automatic systems are usually based on traditional machine learning or deep learning methods. Differently from most of the existing studies, which used either CT scan or X-ray images in COVID-19-case classification, we present a new, simple but efficient deep learning feature fusion model, called U n c e r t a i n t y F u s e N e t , which is able to classify accurately large datasets of both of these types of images. We argue that the uncertainty of the model's predictions should be taken into account in the learning process, even though most of the existing studies have overlooked it. We quantify the prediction uncertainty in our feature fusion model using effective Ensemble Monte Carlo Dropout (EMCD) technique. A comprehensive simulation study has been conducted to compare the results of our new model to the existing approaches, evaluating the performance of competing models in terms of Precision, Recall, F-Measure, Accuracy and ROC curves. The obtained results prove the efficiency of our model which provided the prediction accuracy of 99.08% and 96.35% for the considered CT scan and X-ray datasets, respectively. Moreover, our U n c e r t a i n t y F u s e N e t model was generally robust to noise and performed well with previously unseen data. The source code of our implementation is freely available at: https://github.com/moloud1987/UncertaintyFuseNet-for-COVID-19-Classification.

4.
PLoS One ; 17(12): e0278364, 2022.
Artigo em Inglês | MEDLINE | ID: mdl-36454766

RESUMO

Next basket recommendation is a critical task in market basket data analysis. It is particularly important in grocery shopping, where grocery lists are an essential part of shopping habits of many customers. In this work, we first present a new grocery Recommender System available on the MyGroceryTour platform. Our online system uses different traditional machine learning (ML) and deep learning (DL) algorithms, and provides recommendations to users in a real-time manner. It aims to help Canadian customers create their personalized intelligent weekly grocery lists based on their individual purchase histories, weekly specials offered in local stores, and product cost and availability information. We perform clustering analysis to partition given customer profiles into four non-overlapping clusters according to their grocery shopping habits. Then, we conduct computational experiments to compare several traditional ML algorithms and our new DL algorithm based on the use of a gated recurrent unit (GRU)-based recurrent neural network (RNN) architecture. Our DL algorithm can be viewed as an extension of DREAM (Dynamic REcurrent bAsket Model) adapted to multi-class (i.e. multi-store) classification, since a given user can purchase recommended products in different grocery stores in which these products are available. Among traditional ML algorithms, the highest average F-score of 0.516 for the considered data set of 831 customers was obtained using Random Forest, whereas our proposed DL algorithm yielded the average F-score of 0.559 for this data set. The main advantage of the presented Recommender System is that our intelligent recommendation is personalized, since a separate traditional ML or DL model is built for each customer considered. Such a personalized approach allows us to outperform the prediction results provided by general state-of-the-art DL models.


Assuntos
Algoritmos , Aprendizado de Máquina Supervisionado , Canadá , Análise por Conglomerados , Aprendizado de Máquina
5.
Bioinformatics ; 38(13): 3367-3376, 2022 06 27.
Artigo em Inglês | MEDLINE | ID: mdl-35579343

RESUMO

MOTIVATION: Each gene has its own evolutionary history which can substantially differ from evolutionary histories of other genes. For example, some individual genes or operons can be affected by specific horizontal gene transfer or recombination events. Thus, the evolutionary history of each gene should be represented by its own phylogenetic tree which may display different evolutionary patterns from the species tree that accounts for the main patterns of vertical descent. However, the output of traditional consensus tree or supertree inference methods is a unique consensus tree or supertree. RESULTS: We present a new efficient method for inferring multiple alternative consensus trees and supertrees to best represent the most important evolutionary patterns of a given set of gene phylogenies. We show how an adapted version of the popular k-means clustering algorithm, based on some remarkable properties of the Robinson and Foulds distance, can be used to partition a given set of trees into one (for homogeneous data) or multiple (for heterogeneous data) cluster(s) of trees. Moreover, we adapt the popular Calinski-Harabasz, Silhouette, Ball and Hall, and Gap cluster validity indices to tree clustering with k-means. Special attention is given to the relevant but very challenging problem of inferring alternative supertrees. The use of the Euclidean property of the objective function of the method makes it faster than the existing tree clustering techniques, and thus better suited for analyzing large evolutionary datasets. AVAILABILITY AND IMPLEMENTATION: Our KMeansSuperTreeClustering program along with its C++ source code is available at: https://github.com/TahiriNadia/KMeansSuperTreeClustering. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Assuntos
Algoritmos , Software , Filogenia , Consenso , Análise por Conglomerados
6.
Bioinformatics ; 38(11): 3118-3120, 2022 05 26.
Artigo em Inglês | MEDLINE | ID: mdl-35451456

RESUMO

MOTIVATION: Accurate detection of sequence similarity and homologous recombination are essential parts of many evolutionary analyses. RESULTS: We have developed SimPlot++, an open-source multiplatform application implemented in Python, which can be used to produce publication quality sequence similarity plots using 63 nucleotide and 20 amino acid distance models, to detect intergenic and intragenic recombination events using Φ, Max-χ2, NSS or proportion tests, and to generate and analyze interactive sequence similarity networks. SimPlot++ supports multicore data processing and provides useful distance calculability diagnostics. AVAILABILITY AND IMPLEMENTATION: SimPlot++ is freely available on GitHub at: https://github.com/Stephane-S/Simplot_PlusPlus, as both an executable file (for Windows) and Python scripts (for Windows/Linux/MacOS).


Assuntos
Evolução Biológica , Software
7.
Sci Rep ; 12(1): 179, 2022 01 07.
Artigo em Inglês | MEDLINE | ID: mdl-34996997

RESUMO

Recent years have seen a steep rise in the number of skin cancer detection applications. While modern advances in deep learning made possible reaching new heights in terms of classification accuracy, no publicly available skin cancer detection software provide confidence estimates for these predictions. We present DUNEScan (Deep Uncertainty Estimation for Skin Cancer), a web server that performs an intuitive in-depth analysis of uncertainty in commonly used skin cancer classification models based on convolutional neural networks (CNNs). DUNEScan allows users to upload a skin lesion image, and quickly compares the mean and the variance estimates provided by a number of new and traditional CNN models. Moreover, our web server uses the Grad-CAM and UMAP algorithms to visualize the classification manifold for the user's input, hence providing crucial information about its closeness to skin lesion images  from the popular ISIC database. DUNEScan is freely available at: https://www.dunescan.org .


Assuntos
Aprendizado Profundo , Diagnóstico por Computador , Interpretação de Imagem Assistida por Computador , Internet , Fotografação , Neoplasias Cutâneas/patologia , Técnicas de Apoio para a Decisão , Humanos , Valor Preditivo dos Testes , Reprodutibilidade dos Testes , Neoplasias Cutâneas/classificação , Incerteza
8.
Front Robot AI ; 9: 1076897, 2022.
Artigo em Inglês | MEDLINE | ID: mdl-36817004

RESUMO

This paper introduces an optimal algorithm for solving the discrete grid-based coverage path planning (CPP) problem. This problem consists in finding a path that covers a given region completely. First, we propose a CPP-solving baseline algorithm based on the iterative deepening depth-first search (ID-DFS) approach. Then, we introduce two branch-and-bound strategies (Loop detection and an Admissible heuristic function) to improve the results of our baseline algorithm. We evaluate the performance of our planner using six types of benchmark grids considered in this study: Coast-like, Random links, Random walk, Simple-shapes, Labyrinth and Wide-Labyrinth grids. We are first to consider these types of grids in the context of CPP. All of them find their practical applications in real-world CPP problems from a variety of fields. The obtained results suggest that the proposed branch-and-bound algorithm solves the problem optimally (i.e., the exact solution is found in each case) orders of magnitude faster than an exhaustive search CPP planner. To the best of our knowledge, no general CPP-solving exact algorithms, apart from an exhaustive search planner, have been proposed in the literature.

9.
Comput Biol Med ; 135: 104418, 2021 08.
Artigo em Inglês | MEDLINE | ID: mdl-34052016

RESUMO

Accurate automated medical image recognition, including classification and segmentation, is one of the most challenging tasks in medical image analysis. Recently, deep learning methods have achieved remarkable success in medical image classification and segmentation, clearly becoming the state-of-the-art methods. However, most of these methods are unable to provide uncertainty quantification (UQ) for their output, often being overconfident, which can lead to disastrous consequences. Bayesian Deep Learning (BDL) methods can be used to quantify uncertainty of traditional deep learning methods, and thus address this issue. We apply three uncertainty quantification methods to deal with uncertainty during skin cancer image classification. They are as follows: Monte Carlo (MC) dropout, Ensemble MC (EMC) dropout and Deep Ensemble (DE). To further resolve the remaining uncertainty after applying the MC, EMC and DE methods, we describe a novel hybrid dynamic BDL model, taking into account uncertainty, based on the Three-Way Decision (TWD) theory. The proposed dynamic model enables us to use different UQ methods and different deep neural networks in distinct classification phases. So, the elements of each phase can be adjusted according to the dataset under consideration. In this study, two best UQ methods (i.e., DE and EMC) are applied in two classification phases (the first and second phases) to analyze two well-known skin cancer datasets, preventing one from making overconfident decisions when it comes to diagnosing the disease. The accuracy and the F1-score of our final solution are, respectively, 88.95% and 89.00% for the first dataset, and 90.96% and 91.00% for the second dataset. Our results suggest that the proposed TWDBDL model can be used effectively at different stages of medical image analysis.


Assuntos
Aprendizado Profundo , Neoplasias Cutâneas , Teorema de Bayes , Humanos , Redes Neurais de Computação , Neoplasias Cutâneas/diagnóstico por imagem , Incerteza
10.
Bioinformatics ; 37(16): 2299-2307, 2021 Aug 25.
Artigo em Inglês | MEDLINE | ID: mdl-33599251

RESUMO

MOTIVATION: Off-target predictions are crucial in gene editing research. Recently, significant progress has been made in the field of prediction of off-target mutations, particularly with CRISPR-Cas9 data, thanks to the use of deep learning. CRISPR-Cas9 is a gene editing technique which allows manipulation of DNA fragments. The sgRNA-DNA (single guide RNA-DNA) sequence encoding for deep neural networks, however, has a strong impact on the prediction accuracy. We propose a novel encoding of sgRNA-DNA sequences that aggregates sequence data with no loss of information. RESULTS: In our experiments, we compare the proposed sgRNA-DNA sequence encoding applied in a deep learning prediction framework with state-of-the-art encoding and prediction methods. We demonstrate the superior accuracy of our approach in a simulation study involving Feedforward Neural Networks (FNNs), Convolutional Neural Networks (CNNs) and Recurrent Neural Networks (RNNs) as well as the traditional Random Forest (RF), Naive Bayes (NB) and Logistic Regression (LR) classifiers. We highlight the quality of our results by building several FNNs, CNNs and RNNs with various layer depths and performing predictions on two popular gene editing datasets (CRISPOR and GUIDE-seq). In all our experiments, the new encoding led to more accurate off-target prediction results, providing an improvement of the area under the Receiver Operating Characteristic (ROC) curve up to 35%. AVAILABILITY AND IMPLEMENTATION: The code and data used in this study are available at: https://github.com/dagrate/dl-offtarget. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

11.
BMC Ecol Evol ; 21(1): 5, 2021 01 21.
Artigo em Inglês | MEDLINE | ID: mdl-33514319

RESUMO

BACKGROUND: The SARS-CoV-2 pandemic is one of  the greatest  global medical and social challenges that have emerged in recent history. Human coronavirus strains discovered during previous SARS outbreaks have been hypothesized to pass from bats to humans using intermediate hosts, e.g. civets for SARS-CoV and camels for MERS-CoV. The discovery of an intermediate host of SARS-CoV-2 and the identification of specific mechanism of its emergence in humans are topics of primary evolutionary importance. In this study we investigate the evolutionary patterns of 11 main genes of SARS-CoV-2. Previous studies suggested that the genome of SARS-CoV-2 is highly similar to the horseshoe bat coronavirus RaTG13 for most of the genes and to some Malayan pangolin coronavirus (CoV) strains for the receptor binding (RB) domain of the spike protein. RESULTS: We provide a detailed list of statistically significant horizontal gene transfer and recombination events (both intergenic and intragenic) inferred for each of 11 main genes of the SARS-CoV-2 genome. Our analysis reveals that two continuous regions of genes S and N of SARS-CoV-2 may result from intragenic recombination between RaTG13 and Guangdong (GD) Pangolin CoVs. Statistically significant gene transfer-recombination events between RaTG13 and GD Pangolin CoV have been identified in region [1215-1425] of gene S and region [534-727] of gene N. Moreover, some statistically significant recombination events between the ancestors of SARS-CoV-2, RaTG13, GD Pangolin CoV and bat CoV ZC45-ZXC21 coronaviruses have been identified in genes ORF1ab, S, ORF3a, ORF7a, ORF8 and N. Furthermore, topology-based clustering of gene trees inferred for 25 CoV organisms revealed a three-way evolution of coronavirus genes, with gene phylogenies of ORF1ab, S and N forming the first cluster, gene phylogenies of ORF3a, E, M, ORF6, ORF7a, ORF7b and ORF8 forming the second cluster, and phylogeny of gene ORF10 forming the third cluster. CONCLUSIONS: The results of our horizontal gene transfer and recombination analysis suggest that SARS-CoV-2 could not only be a chimera virus resulting from recombination of the bat RaTG13 and Guangdong pangolin coronaviruses but also a close relative of the bat CoV ZC45 and ZXC21 strains. They also indicate that a GD pangolin may be an intermediate host of this dangerous virus.


Assuntos
COVID-19 , SARS-CoV-2 , Animais , Evolução Molecular , Transferência Genética Horizontal , Genoma Viral/genética , Humanos
12.
Artigo em Inglês | MEDLINE | ID: mdl-31180868

RESUMO

Considerable efforts have been made over the last decades to improve the robustness of clustering algorithms against noise features and outliers, known to be important sources of error in clustering. Outliers dominate the sum-of-the-squares calculations and generate cluster overlap, thus leading to unreliable clustering results. They can be particularly detrimental in computational biology, e.g., when determining the number of clusters in gene expression data related to cancer or when inferring phylogenetic trees and networks. While the issue of feature weighting has been studied in detail, no clustering methods using object weighting have been proposed yet. Here we describe a new general data partitioning method that includes an object-weighting step to assign higher weights to outliers and objects that cause cluster overlap. Different object weighting schemes, based on the Silhouette cluster validity index, the median and two intercluster distances, are defined. We compare our novel technique to a number of popular and efficient clustering algorithms, such as K-means, X-means, DAPC and Prediction Strength. In the presence of outliers and cluster overlap, our method largely outperforms X-means, DAPC and Prediction Strength as well as the K-means algorithm based on feature weighting.


Assuntos
Algoritmos , Análise por Conglomerados , Biologia Computacional/métodos , Humanos , Neoplasias/genética , Filogenia , Transcriptoma
13.
Bioinformatics ; 36(9): 2740-2749, 2020 05 01.
Artigo em Inglês | MEDLINE | ID: mdl-31971565

RESUMO

MOTIVATION: Phylogenetic trees and the methods for their analysis have played a key role in many evolutionary, ecological and bioinformatics studies. Alternatively, phylogenetic networks have been widely used to analyze and represent complex reticulate evolutionary processes which cannot be adequately studied using traditional phylogenetic methods. These processes include, among others, hybridization, horizontal gene transfer, and genetic recombination. Nowadays, sequence similarity and genome similarity networks have become an efficient tool for community analysis of large molecular datasets in comparative studies. These networks can be used for tackling a variety of complex evolutionary problems such as the identification of horizontal gene transfer events, the recovery of mosaic genes and genomes, and the study of holobionts. RESULTS: The shortest path in a phylogenetic tree is used to estimate evolutionary distances between species. We show how the shortest path concept can be extended to sequence similarity networks by defining five new distances, NetUniFrac, Spp, Spep, Spelp and Spinp, and the Transfer index, between species communities present in the network. These new distances can be seen as network analogs of the traditional UniFrac distance used to assess dissimilarity between species communities in a phylogenetic tree, whereas the Transfer index is intended for estimating the rate and direction of gene transfers, or species dispersal, between different phylogenetic, or ecological, species communities. Moreover, NetUniFrac and the Transfer index can be computed in linear time with respect to the number of edges in the network. We show how these new measures can be used to analyze microbiota and antibiotic resistance gene similarity networks. AVAILABILITY AND IMPLEMENTATION: Our NetFrac program, implemented in R and C, along with its source code, is freely available on Github at the following URL address: https://github.com/XPHenry/Netfrac. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Assuntos
Evolução Molecular , Software , Transferência Genética Horizontal , Genoma , Filogenia
14.
Genome Biol Evol ; 11(9): 2653-2665, 2019 09 01.
Artigo em Inglês | MEDLINE | ID: mdl-31504500

RESUMO

Explaining the evolution of animals requires ecological, developmental, paleontological, and phylogenetic considerations because organismal traits are affected by complex evolutionary processes. Modeling a plurality of processes, operating at distinct time-scales on potentially interdependent traits, can benefit from approaches that are complementary treatments to phylogenetics. Here, we developed an inclusive network approach, implemented in the command line software ComponentGrapher, and analyzed trait co-occurrence of rhinocerotoid mammals. We identified stable, unstable, and pivotal traits, as well as traits contributing to complexes, that may follow to a common developmental regulation, that point to an early implementation of the postcranial Bauplan among rhinocerotoids. Strikingly, most identified traits are highly dissociable, used repeatedly in distinct combinations and in different taxa, which usually do not form clades. Therefore, the genes encoding these traits are likely recruited into novel gene regulation networks during the course of evolution. Our evo-systemic framework, generalizable to other evolved organizations, supports a pluralistic modeling of organismal evolution, including trees and networks.


Assuntos
Evolução Biológica , Mamíferos/anatomia & histologia , Mamíferos/genética , Animais , Osso e Ossos/anatomia & histologia , Mamíferos/classificação , Filogenia , Software , Dente/anatomia & histologia
15.
Comput Methods Programs Biomed ; 179: 104992, 2019 Oct.
Artigo em Inglês | MEDLINE | ID: mdl-31443858

RESUMO

BACKGROUND AND OBJECTIVE: Coronary artery disease (CAD) is one of the commonest diseases around the world. An early and accurate diagnosis of CAD allows a timely administration of appropriate treatment and helps to reduce the mortality. Herein, we describe an innovative machine learning methodology that enables an accurate detection of CAD and apply it to data collected from Iranian patients. METHODS: We first tested ten traditional machine learning algorithms, and then the three-best performing algorithms (three types of SVM) were used in the rest of the study. To improve the performance of these algorithms, a data preprocessing with normalization was carried out. Moreover, a genetic algorithm and particle swarm optimization, coupled with stratified 10-fold cross-validation, were used twice: for optimization of classifier parameters and for parallel selection of features. RESULTS: The presented approach enhanced the performance of all traditional machine learning algorithms used in this study. We also introduced a new optimization technique called N2Genetic optimizer (a new genetic training). Our experiments demonstrated that N2Genetic-nuSVM provided the accuracy of 93.08% and F1-score of 91.51% when predicting CAD outcomes among the patients included in a well-known Z-Alizadeh Sani dataset. These results are competitive and comparable to the best results in the field. CONCLUSIONS: We showed that machine-learning techniques optimized by the proposed approach, can lead to highly accurate models intended for both clinical and research use.


Assuntos
Doença da Artéria Coronariana/diagnóstico , Aprendizado de Máquina , Algoritmos , Mineração de Dados/estatística & dados numéricos , Bases de Dados Factuais/estatística & dados numéricos , Diagnóstico por Computador/estatística & dados numéricos , Feminino , Humanos , Aprendizado de Máquina/estatística & dados numéricos , Masculino , Modelos Cardiovasculares , Máquina de Vetores de Suporte/estatística & dados numéricos
16.
J Med Syst ; 43(7): 220, 2019 Jun 07.
Artigo em Inglês | MEDLINE | ID: mdl-31175462

RESUMO

Wart disease (WD) is a skin illness on the human body which is caused by the human papillomavirus (HPV). This study mainly concentrates on common and plantar warts. There are various treatment methods for this disease, including the popular immunotherapy and cryotherapy methods. Manual evaluation of the WD treatment response is challenging. Furthermore, traditional machine learning methods are not robust enough in WD classification as they cannot deal effectively with small number of attributes. This study proposes a new evolutionary-based computer-aided diagnosis (CAD) system using machine learning to classify the WD treatment response. The main architecture of our CAD system is based on the combination of improved adaptive particle swarm optimization (IAPSO) algorithm and artificial immune recognition system (AIRS). The cross-validation protocol was applied to test our machine learning-based classification system, including five different partition protocols (K2, K3, K4, K5 and K10). Our database consisted of 180 records taken from immunotherapy and cryotherapy databases. The best results were obtained using the K10 protocol that provided the precision, recall, F-measure and accuracy values of 0.8908, 0.8943, 0.8916 and 90%, respectively. Our IAPSO system showed the reliability of 98.68%. It was implemented in Java, while integrated development environment (IDE) was implemented using NetBeans. Our encouraging results suggest that the proposed IAPSO-AIRS system can be employed for the WD management in clinical environment.


Assuntos
Diagnóstico por Computador , Aprendizado de Máquina , Verrugas/terapia , Adolescente , Adulto , Idoso , Crioterapia , Mineração de Dados , Feminino , Humanos , Imunoterapia , Masculino , Pessoa de Meia-Idade , Reprodutibilidade dos Testes , Resultado do Tratamento , Adulto Jovem
17.
PLoS One ; 13(8): e0201446, 2018.
Artigo em Inglês | MEDLINE | ID: mdl-30089142

RESUMO

The emergence of functional specialization is a core problem in biology. In this work we focus on the emergence of reproductive (germ) and vegetative viability-enhancing (soma) cell functions (or germ-soma specialization). We consider a group of cells and assume that they contribute to two different evolutionary tasks, fecundity and viability. The potential of cells to contribute to fitness components is traded off. As embodied in current models, the curvature of the trade-off between fecundity and viability is concave in small-sized organisms and convex in large-sized multicellular organisms. We present a general mathematical model that explores how the division of labor in a cell colony depends on the trade-off curvatures, a resource constraint and different fecundity and viability rates. Moreover, we consider the case of different trade-off functions for different cells. We describe the set of all possible solutions of the formulated mathematical programming problem and show some interesting examples of optimal specialization strategies found for our objective fitness function. Our results suggest that the transition to specialized organisms can be achieved in several ways. The evolution of Volvocalean green algae is considered to illustrate the application of our model. The proposed model can be generalized to address a number of important biological issues, including the evolution of specialized enzymes and the emergence of complex organs.


Assuntos
Comunicação Celular/fisiologia , Diferenciação Celular/fisiologia , Clorófitas/fisiologia , Fertilidade/fisiologia , Modelos Biológicos , Evolução Biológica , Sobrevivência Celular/fisiologia , Clorófitas/citologia , Células Germinativas Vegetais/fisiologia
18.
BMC Evol Biol ; 18(1): 48, 2018 04 05.
Artigo em Inglês | MEDLINE | ID: mdl-29621975

RESUMO

BACKGROUND: Gene trees carry important information about specific evolutionary patterns which characterize the evolution of the corresponding gene families. However, a reliable species consensus tree cannot be inferred from a multiple sequence alignment of a single gene family or from the concatenation of alignments corresponding to gene families having different evolutionary histories. These evolutionary histories can be quite different due to horizontal transfer events or to ancient gene duplications which cause the emergence of paralogs within a genome. Many methods have been proposed to infer a single consensus tree from a collection of gene trees. Still, the application of these tree merging methods can lead to the loss of specific evolutionary patterns which characterize some gene families or some groups of gene families. Thus, the problem of inferring multiple consensus trees from a given set of gene trees becomes relevant. RESULTS: We describe a new fast method for inferring multiple consensus trees from a given set of phylogenetic trees (i.e. additive trees or X-trees) defined on the same set of species (i.e. objects or taxa). The traditional consensus approach yields a single consensus tree. We use the popular k-medoids partitioning algorithm to divide a given set of trees into several clusters of trees. We propose novel versions of the well-known Silhouette and Calinski-Harabasz cluster validity indices that are adapted for tree clustering with k-medoids. The efficiency of the new method was assessed using both synthetic and real data, such as a well-known phylogenetic dataset consisting of 47 gene trees inferred for 14 archaeal organisms. CONCLUSIONS: The method described here allows inference of multiple consensus trees from a given set of gene trees. It can be used to identify groups of gene trees having similar intragroup and different intergroup evolutionary histories. The main advantage of our method is that it is much faster than the existing tree clustering approaches, while providing similar or better clustering results in most cases. This makes it particularly well suited for the analysis of large genomic and phylogenetic datasets.


Assuntos
Algoritmos , Genômica/métodos , Filogenia , Archaea/metabolismo , Análise por Conglomerados , Simulação por Computador , Transferência Genética Horizontal/genética , Proteínas Ribossômicas/metabolismo , Especificidade da Espécie
19.
SLAS Discov ; 23(5): 448-458, 2018 06.
Artigo em Inglês | MEDLINE | ID: mdl-29346010

RESUMO

Data generated by high-throughput screening (HTS) technologies are prone to spatial bias. Traditionally, bias correction methods used in HTS assume either a simple additive or, more recently, a simple multiplicative spatial bias model. These models do not, however, always provide an accurate correction of measurements in wells located at the intersection of rows and columns affected by spatial bias. The measurements in these wells depend on the nature of interaction between the involved biases. Here, we propose two novel additive and two novel multiplicative spatial bias models accounting for different types of bias interactions. We describe a statistical procedure that allows for detecting and removing different types of additive and multiplicative spatial biases from multiwell plates. We show how this procedure can be applied by analyzing data generated by the four HTS technologies (homogeneous, microorganism, cell-based, and gene expression HTS), the three high-content screening (HCS) technologies (area, intensity, and cell-count HCS), and the only small-molecule microarray technology available in the ChemBank small-molecule screening database. The proposed methods are included in the AssayCorrector program, implemented in R, and available on CRAN.


Assuntos
Ensaios de Triagem em Larga Escala/métodos , Viés , Bases de Dados de Compostos Químicos , Descoberta de Drogas/métodos , Bibliotecas de Moléculas Pequenas/química
20.
Sci Rep ; 7(1): 11921, 2017 09 20.
Artigo em Inglês | MEDLINE | ID: mdl-28931934

RESUMO

Spatial bias continues to be a major challenge in high-throughput screening technologies. Its successful detection and elimination are critical for identifying the most promising drug candidates. Here, we examine experimental small molecule assays from the popular ChemBank database and show that screening data are widely affected by both assay-specific and plate-specific spatial biases. Importantly, the bias affecting screening data can fit an additive or multiplicative model. We show that the use of appropriate statistical methods is essential for improving the quality of experimental screening data. The presented methodology can be recommended for the analysis of current and next-generation screening data.

SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA
...