Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 50
Filtrar
Más filtros

Banco de datos
Tipo del documento
Intervalo de año de publicación
1.
Brief Bioinform ; 24(3)2023 05 19.
Artículo en Inglés | MEDLINE | ID: mdl-37080758

RESUMEN

CRISPR/Cas9 (Clustered Regularly Interspaced Short Palindromic Repeats and CRISPR-associated protein 9) is a popular and effective two-component technology used for targeted genetic manipulation. It is currently the most versatile and accurate method of gene and genome editing, which benefits from a large variety of practical applications. For example, in biomedicine, it has been used in research related to cancer, virus infections, pathogen detection, and genetic diseases. Current CRISPR/Cas9 research is based on data-driven models for on- and off-target prediction as a cleavage may occur at non-target sequence locations. Nowadays, conventional machine learning and deep learning methods are applied on a regular basis to accurately predict on-target knockout efficacy and off-target profile of given single-guide RNAs (sgRNAs). In this paper, we present an overview and a comparative analysis of traditional machine learning and deep learning models used in CRISPR/Cas9. We highlight the key research challenges and directions associated with target activity prediction. We discuss recent advances in the sgRNA-DNA sequence encoding used in state-of-the-art on- and off-target prediction models. Furthermore, we present the most popular deep learning neural network architectures used in CRISPR/Cas9 prediction models. Finally, we summarize the existing challenges and discuss possible future investigations in the field of on- and off-target prediction. Our paper provides valuable support for academic and industrial researchers interested in the application of machine learning methods in the field of CRISPR/Cas9 genome editing.


Asunto(s)
Sistemas CRISPR-Cas , Aprendizaje Profundo , Edición Génica/métodos , Aprendizaje Automático
2.
Bioinformatics ; 38(13): 3367-3376, 2022 06 27.
Artículo en Inglés | MEDLINE | ID: mdl-35579343

RESUMEN

MOTIVATION: Each gene has its own evolutionary history which can substantially differ from evolutionary histories of other genes. For example, some individual genes or operons can be affected by specific horizontal gene transfer or recombination events. Thus, the evolutionary history of each gene should be represented by its own phylogenetic tree which may display different evolutionary patterns from the species tree that accounts for the main patterns of vertical descent. However, the output of traditional consensus tree or supertree inference methods is a unique consensus tree or supertree. RESULTS: We present a new efficient method for inferring multiple alternative consensus trees and supertrees to best represent the most important evolutionary patterns of a given set of gene phylogenies. We show how an adapted version of the popular k-means clustering algorithm, based on some remarkable properties of the Robinson and Foulds distance, can be used to partition a given set of trees into one (for homogeneous data) or multiple (for heterogeneous data) cluster(s) of trees. Moreover, we adapt the popular Calinski-Harabasz, Silhouette, Ball and Hall, and Gap cluster validity indices to tree clustering with k-means. Special attention is given to the relevant but very challenging problem of inferring alternative supertrees. The use of the Euclidean property of the objective function of the method makes it faster than the existing tree clustering techniques, and thus better suited for analyzing large evolutionary datasets. AVAILABILITY AND IMPLEMENTATION: Our KMeansSuperTreeClustering program along with its C++ source code is available at: https://github.com/TahiriNadia/KMeansSuperTreeClustering. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Asunto(s)
Algoritmos , Programas Informáticos , Filogenia , Consenso , Análisis por Conglomerados
3.
Bioinformatics ; 38(11): 3118-3120, 2022 05 26.
Artículo en Inglés | MEDLINE | ID: mdl-35451456

RESUMEN

MOTIVATION: Accurate detection of sequence similarity and homologous recombination are essential parts of many evolutionary analyses. RESULTS: We have developed SimPlot++, an open-source multiplatform application implemented in Python, which can be used to produce publication quality sequence similarity plots using 63 nucleotide and 20 amino acid distance models, to detect intergenic and intragenic recombination events using Φ, Max-χ2, NSS or proportion tests, and to generate and analyze interactive sequence similarity networks. SimPlot++ supports multicore data processing and provides useful distance calculability diagnostics. AVAILABILITY AND IMPLEMENTATION: SimPlot++ is freely available on GitHub at: https://github.com/Stephane-S/Simplot_PlusPlus, as both an executable file (for Windows) and Python scripts (for Windows/Linux/MacOS).


Asunto(s)
Evolución Biológica , Programas Informáticos
4.
Inf Fusion ; 90: 364-381, 2023 Feb.
Artículo en Inglés | MEDLINE | ID: mdl-36217534

RESUMEN

The COVID-19 (Coronavirus disease 2019) pandemic has become a major global threat to human health and well-being. Thus, the development of computer-aided detection (CAD) systems that are capable of accurately distinguishing COVID-19 from other diseases using chest computed tomography (CT) and X-ray data is of immediate priority. Such automatic systems are usually based on traditional machine learning or deep learning methods. Differently from most of the existing studies, which used either CT scan or X-ray images in COVID-19-case classification, we present a new, simple but efficient deep learning feature fusion model, called U n c e r t a i n t y F u s e N e t , which is able to classify accurately large datasets of both of these types of images. We argue that the uncertainty of the model's predictions should be taken into account in the learning process, even though most of the existing studies have overlooked it. We quantify the prediction uncertainty in our feature fusion model using effective Ensemble Monte Carlo Dropout (EMCD) technique. A comprehensive simulation study has been conducted to compare the results of our new model to the existing approaches, evaluating the performance of competing models in terms of Precision, Recall, F-Measure, Accuracy and ROC curves. The obtained results prove the efficiency of our model which provided the prediction accuracy of 99.08% and 96.35% for the considered CT scan and X-ray datasets, respectively. Moreover, our U n c e r t a i n t y F u s e N e t model was generally robust to noise and performed well with previously unseen data. The source code of our implementation is freely available at: https://github.com/moloud1987/UncertaintyFuseNet-for-COVID-19-Classification.

5.
Bioinformatics ; 37(16): 2299-2307, 2021 Aug 25.
Artículo en Inglés | MEDLINE | ID: mdl-33599251

RESUMEN

MOTIVATION: Off-target predictions are crucial in gene editing research. Recently, significant progress has been made in the field of prediction of off-target mutations, particularly with CRISPR-Cas9 data, thanks to the use of deep learning. CRISPR-Cas9 is a gene editing technique which allows manipulation of DNA fragments. The sgRNA-DNA (single guide RNA-DNA) sequence encoding for deep neural networks, however, has a strong impact on the prediction accuracy. We propose a novel encoding of sgRNA-DNA sequences that aggregates sequence data with no loss of information. RESULTS: In our experiments, we compare the proposed sgRNA-DNA sequence encoding applied in a deep learning prediction framework with state-of-the-art encoding and prediction methods. We demonstrate the superior accuracy of our approach in a simulation study involving Feedforward Neural Networks (FNNs), Convolutional Neural Networks (CNNs) and Recurrent Neural Networks (RNNs) as well as the traditional Random Forest (RF), Naive Bayes (NB) and Logistic Regression (LR) classifiers. We highlight the quality of our results by building several FNNs, CNNs and RNNs with various layer depths and performing predictions on two popular gene editing datasets (CRISPOR and GUIDE-seq). In all our experiments, the new encoding led to more accurate off-target prediction results, providing an improvement of the area under the Receiver Operating Characteristic (ROC) curve up to 35%. AVAILABILITY AND IMPLEMENTATION: The code and data used in this study are available at: https://github.com/dagrate/dl-offtarget. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

6.
Bioinformatics ; 36(9): 2740-2749, 2020 05 01.
Artículo en Inglés | MEDLINE | ID: mdl-31971565

RESUMEN

MOTIVATION: Phylogenetic trees and the methods for their analysis have played a key role in many evolutionary, ecological and bioinformatics studies. Alternatively, phylogenetic networks have been widely used to analyze and represent complex reticulate evolutionary processes which cannot be adequately studied using traditional phylogenetic methods. These processes include, among others, hybridization, horizontal gene transfer, and genetic recombination. Nowadays, sequence similarity and genome similarity networks have become an efficient tool for community analysis of large molecular datasets in comparative studies. These networks can be used for tackling a variety of complex evolutionary problems such as the identification of horizontal gene transfer events, the recovery of mosaic genes and genomes, and the study of holobionts. RESULTS: The shortest path in a phylogenetic tree is used to estimate evolutionary distances between species. We show how the shortest path concept can be extended to sequence similarity networks by defining five new distances, NetUniFrac, Spp, Spep, Spelp and Spinp, and the Transfer index, between species communities present in the network. These new distances can be seen as network analogs of the traditional UniFrac distance used to assess dissimilarity between species communities in a phylogenetic tree, whereas the Transfer index is intended for estimating the rate and direction of gene transfers, or species dispersal, between different phylogenetic, or ecological, species communities. Moreover, NetUniFrac and the Transfer index can be computed in linear time with respect to the number of edges in the network. We show how these new measures can be used to analyze microbiota and antibiotic resistance gene similarity networks. AVAILABILITY AND IMPLEMENTATION: Our NetFrac program, implemented in R and C, along with its source code, is freely available on Github at the following URL address: https://github.com/XPHenry/Netfrac. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Asunto(s)
Evolución Molecular , Programas Informáticos , Transferencia de Gen Horizontal , Genoma , Filogenia
7.
J Med Syst ; 43(7): 220, 2019 Jun 07.
Artículo en Inglés | MEDLINE | ID: mdl-31175462

RESUMEN

Wart disease (WD) is a skin illness on the human body which is caused by the human papillomavirus (HPV). This study mainly concentrates on common and plantar warts. There are various treatment methods for this disease, including the popular immunotherapy and cryotherapy methods. Manual evaluation of the WD treatment response is challenging. Furthermore, traditional machine learning methods are not robust enough in WD classification as they cannot deal effectively with small number of attributes. This study proposes a new evolutionary-based computer-aided diagnosis (CAD) system using machine learning to classify the WD treatment response. The main architecture of our CAD system is based on the combination of improved adaptive particle swarm optimization (IAPSO) algorithm and artificial immune recognition system (AIRS). The cross-validation protocol was applied to test our machine learning-based classification system, including five different partition protocols (K2, K3, K4, K5 and K10). Our database consisted of 180 records taken from immunotherapy and cryotherapy databases. The best results were obtained using the K10 protocol that provided the precision, recall, F-measure and accuracy values of 0.8908, 0.8943, 0.8916 and 90%, respectively. Our IAPSO system showed the reliability of 98.68%. It was implemented in Java, while integrated development environment (IDE) was implemented using NetBeans. Our encouraging results suggest that the proposed IAPSO-AIRS system can be employed for the WD management in clinical environment.


Asunto(s)
Diagnóstico por Computador , Aprendizaje Automático , Verrugas/terapia , Adolescente , Adulto , Anciano , Crioterapia , Minería de Datos , Femenino , Humanos , Inmunoterapia , Masculino , Persona de Mediana Edad , Reproducibilidad de los Resultados , Resultado del Tratamiento , Adulto Joven
8.
BMC Evol Biol ; 18(1): 48, 2018 04 05.
Artículo en Inglés | MEDLINE | ID: mdl-29621975

RESUMEN

BACKGROUND: Gene trees carry important information about specific evolutionary patterns which characterize the evolution of the corresponding gene families. However, a reliable species consensus tree cannot be inferred from a multiple sequence alignment of a single gene family or from the concatenation of alignments corresponding to gene families having different evolutionary histories. These evolutionary histories can be quite different due to horizontal transfer events or to ancient gene duplications which cause the emergence of paralogs within a genome. Many methods have been proposed to infer a single consensus tree from a collection of gene trees. Still, the application of these tree merging methods can lead to the loss of specific evolutionary patterns which characterize some gene families or some groups of gene families. Thus, the problem of inferring multiple consensus trees from a given set of gene trees becomes relevant. RESULTS: We describe a new fast method for inferring multiple consensus trees from a given set of phylogenetic trees (i.e. additive trees or X-trees) defined on the same set of species (i.e. objects or taxa). The traditional consensus approach yields a single consensus tree. We use the popular k-medoids partitioning algorithm to divide a given set of trees into several clusters of trees. We propose novel versions of the well-known Silhouette and Calinski-Harabasz cluster validity indices that are adapted for tree clustering with k-medoids. The efficiency of the new method was assessed using both synthetic and real data, such as a well-known phylogenetic dataset consisting of 47 gene trees inferred for 14 archaeal organisms. CONCLUSIONS: The method described here allows inference of multiple consensus trees from a given set of gene trees. It can be used to identify groups of gene trees having similar intragroup and different intergroup evolutionary histories. The main advantage of our method is that it is much faster than the existing tree clustering approaches, while providing similar or better clustering results in most cases. This makes it particularly well suited for the analysis of large genomic and phylogenetic datasets.


Asunto(s)
Algoritmos , Genómica/métodos , Filogenia , Archaea/metabolismo , Análisis por Conglomerados , Simulación por Computador , Transferencia de Gen Horizontal/genética , Proteínas Ribosómicas/metabolismo , Especificidad de la Especie
9.
Bioinformatics ; 33(20): 3258-3267, 2017 Oct 15.
Artículo en Inglés | MEDLINE | ID: mdl-28633418

RESUMEN

MOTIVATION: Considerable attention has been paid recently to improve data quality in high-throughput screening (HTS) and high-content screening (HCS) technologies widely used in drug development and chemical toxicity research. However, several environmentally- and procedurally-induced spatial biases in experimental HTS and HCS screens decrease measurement accuracy, leading to increased numbers of false positives and false negatives in hit selection. Although effective bias correction methods and software have been developed over the past decades, almost all of these tools have been designed to reduce the effect of additive bias only. Here, we address the case of multiplicative spatial bias. RESULTS: We introduce three new statistical methods meant to reduce multiplicative spatial bias in screening technologies. We assess the performance of the methods with synthetic and real data affected by multiplicative spatial bias, including comparisons with current bias correction methods. We also describe a wider data correction protocol that integrates methods for removing both assay and plate-specific spatial biases, which can be either additive or multiplicative. CONCLUSIONS: The methods for removing multiplicative spatial bias and the data correction protocol are effective in detecting and cleaning experimental data generated by screening technologies. As our protocol is of a general nature, it can be used by researchers analyzing current or next-generation high-throughput screens. AVAILABILITY AND IMPLEMENTATION: The AssayCorrector program, implemented in R, is available on CRAN. CONTACT: makarenkov.vladimir@uqam.ca. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Asunto(s)
Bioensayo/métodos , Biología Computacional/métodos , Ensayos Analíticos de Alto Rendimiento/métodos , Programas Informáticos , Sesgo , Descubrimiento de Drogas/métodos , Infecciones por VIH/tratamiento farmacológico , Humanos , Toxicología/métodos
10.
Brief Bioinform ; 16(6): 974-86, 2015 Nov.
Artículo en Inglés | MEDLINE | ID: mdl-25750417

RESUMEN

Significant efforts have been made recently to improve data throughput and data quality in screening technologies related to drug design. The modern pharmaceutical industry relies heavily on high-throughput screening (HTS) and high-content screening (HCS) technologies, which include small molecule, complementary DNA (cDNA) and RNA interference (RNAi) types of screening. Data generated by these screening technologies are subject to several environmental and procedural systematic biases, which introduce errors into the hit identification process. We first review systematic biases typical of HTS and HCS screens. We highlight that study design issues and the way in which data are generated are crucial for providing unbiased screening results. Considering various data sets, including the publicly available ChemBank data, we assess the rates of systematic bias in experimental HTS by using plate-specific and assay-specific error detection tests. We describe main data normalization and correction techniques and introduce a general data preprocessing protocol. This protocol can be recommended for academic and industrial researchers involved in the analysis of current or next-generation HTS data.


Asunto(s)
Secuenciación de Nucleótidos de Alto Rendimiento/normas , ADN Complementario/genética , Interferencia de ARN , Reproducibilidad de los Resultados
11.
BMC Evol Biol ; 16: 180, 2016 09 06.
Artículo en Inglés | MEDLINE | ID: mdl-27600442

RESUMEN

BACKGROUND: Curious parallels between the processes of species and language evolution have been observed by many researchers. Retracing the evolution of Indo-European (IE) languages remains one of the most intriguing intellectual challenges in historical linguistics. Most of the IE language studies use the traditional phylogenetic tree model to represent the evolution of natural languages, thus not taking into account reticulate evolutionary events, such as language hybridization and word borrowing which can be associated with species hybridization and horizontal gene transfer, respectively. More recently, implicit evolutionary networks, such as split graphs and minimal lateral networks, have been used to account for reticulate evolution in linguistics. RESULTS: Striking parallels existing between the evolution of species and natural languages allowed us to apply three computational biology methods for reconstruction of phylogenetic networks to model the evolution of IE languages. We show how the transfer of methods between the two disciplines can be achieved, making necessary methodological adaptations. Considering basic vocabulary data from the well-known Dyen's lexical database, which contains word forms in 84 IE languages for the meanings of a 200-meaning Swadesh list, we adapt a recently developed computational biology algorithm for building explicit hybridization networks to study the evolution of IE languages and compare our findings to the results provided by the split graph and galled network methods. CONCLUSION: We conclude that explicit phylogenetic networks can be successfully used to identify donors and recipients of lexical material as well as the degree of influence of each donor language on the corresponding recipient languages. We show that our algorithm is well suited to detect reticulate relationships among languages, and present some historical and linguistic justification for the results obtained. Our findings could be further refined if relevant syntactic, phonological and morphological data could be analyzed along with the available lexical data.


Asunto(s)
Lenguaje , Modelos Teóricos , Algoritmos , Biología Computacional , Bases de Datos Factuales , Europa (Continente) , India , Lingüística , Filogenia
12.
BMC Bioinformatics ; 16: 68, 2015 Mar 03.
Artículo en Inglés | MEDLINE | ID: mdl-25887434

RESUMEN

BACKGROUND: Workflows, or computational pipelines, consisting of collections of multiple linked tasks are becoming more and more popular in many scientific fields, including computational biology. For example, simulation studies, which are now a must for statistical validation of new bioinformatics methods and software, are frequently carried out using the available workflow platforms. Workflows are typically organized to minimize the total execution time and to maximize the efficiency of the included operations. Clustering algorithms can be applied either for regrouping similar workflows for their simultaneous execution on a server, or for dispatching some lengthy workflows to different servers, or for classifying the available workflows with a view to performing a specific keyword search. RESULTS: In this study, we consider four different workflow encoding and clustering schemes which are representative for bioinformatics projects. Some of them allow for clustering workflows with similar topological features, while the others regroup workflows according to their specific attributes (e.g. associated keywords) or execution time. The four types of workflow encoding examined in this study were compared using the weighted versions of k-means and k-medoids partitioning algorithms. The Calinski-Harabasz, Silhouette and logSS clustering indices were considered. Hierarchical classification methods, including the UPGMA, Neighbor Joining, Fitch and Kitsch algorithms, were also applied to classify bioinformatics workflows. Moreover, a novel pairwise measure of clustering solution stability, which can be computed in situations when a series of independent program runs is carried out, was introduced. CONCLUSIONS: Our findings based on the analysis of 220 real-life bioinformatics workflows suggest that the weighted clustering models based on keywords information or tasks execution times provide the most appropriate clustering solutions. Using datasets generated by the Armadillo and Taverna scientific workflow management system, we found that the weighted cosine distance in association with the k-medoids partitioning algorithm and the presence-absence workflow encoding provided the highest values of the Rand index among all compared clustering strategies. The introduced clustering stability indices, PS and PSG, can be effectively used to identify elements with a low clustering support.


Asunto(s)
Algoritmos , Biología Computacional/métodos , Programas Informáticos , Flujo de Trabajo , Análisis por Conglomerados , Conjuntos de Datos como Asunto , Filogenia
13.
Nucleic Acids Res ; 40(Web Server issue): W573-9, 2012 Jul.
Artículo en Inglés | MEDLINE | ID: mdl-22675075

RESUMEN

T-REX (Tree and reticulogram REConstruction) is a web server dedicated to the reconstruction of phylogenetic trees, reticulation networks and to the inference of horizontal gene transfer (HGT) events. T-REX includes several popular bioinformatics applications such as MUSCLE, MAFFT, Neighbor Joining, NINJA, BioNJ, PhyML, RAxML, random phylogenetic tree generator and some well-known sequence-to-distance transformation models. It also comprises fast and effective methods for inferring phylogenetic trees from complete and incomplete distance matrices as well as for reconstructing reticulograms and HGT networks, including the detection and validation of complete and partial gene transfers, inference of consensus HGT scenarios and interactive HGT identification, developed by the authors. The included methods allows for validating and visualizing phylogenetic trees and networks which can be built from distance or sequence data. The web server is available at: www.trex.uqam.ca.


Asunto(s)
Transferencia de Gen Horizontal , Filogenia , Programas Informáticos , Gráficos por Computador , Internet
14.
PLoS One ; 19(4): e0301195, 2024.
Artículo en Inglés | MEDLINE | ID: mdl-38574109

RESUMEN

Understanding the evolution of Severe Acute Respiratory Syndrome Coronavirus (SARS-CoV-2) and its relationship to other coronaviruses in the wild is crucial for preventing future virus outbreaks. While the origin of the SARS-CoV-2 pandemic remains uncertain, mounting evidence suggests the direct involvement of the bat and pangolin coronaviruses in the evolution of the SARS-CoV-2 genome. To unravel the early days of a probable zoonotic spillover event, we analyzed genomic data from various coronavirus strains from both human and wild hosts. Bayesian phylogenetic analysis was performed using multiple datasets, using strict and relaxed clock evolutionary models to estimate the occurrence times of key speciation, gene transfer, and recombination events affecting the evolution of SARS-CoV-2 and its closest relatives. We found strong evidence supporting the presence of temporal structure in datasets containing SARS-CoV-2 variants, enabling us to estimate the time of SARS-CoV-2 zoonotic spillover between August and early October 2019. In contrast, datasets without SARS-CoV-2 variants provided mixed results in terms of temporal structure. However, they allowed us to establish that the presence of a statistically robust clade in the phylogenies of gene S and its receptor-binding (RBD) domain, including two bat (BANAL) and two Guangdong pangolin coronaviruses (CoVs), is due to the horizontal gene transfer of this gene from the bat CoV to the pangolin CoV that occurred in the middle of 2018. Importantly, this clade is closely located to SARS-CoV-2 in both phylogenies. This phylogenetic proximity had been explained by an RBD gene transfer from the Guangdong pangolin CoV to a very recent ancestor of SARS-CoV-2 in some earlier works in the field before the BANAL coronaviruses were discovered. Overall, our study provides valuable insights into the timeline and evolutionary dynamics of the SARS-CoV-2 pandemic.


Asunto(s)
COVID-19 , Quirópteros , Animales , Humanos , SARS-CoV-2/genética , Filogenia , Pangolines/genética , COVID-19/epidemiología , Teorema de Bayes , Zoonosis/epidemiología
15.
BMC Evol Biol ; 13: 274, 2013 Dec 23.
Artículo en Inglés | MEDLINE | ID: mdl-24359207

RESUMEN

BACKGROUND: The advent of molecular biology techniques and constant increase in availability of genetic material have triggered the development of many phylogenetic tree inference methods. However, several reticulate evolution processes, such as horizontal gene transfer and hybridization, have been shown to blur the species evolutionary history by causing discordance among phylogenies inferred from different genes. METHODS: To tackle this problem, we hereby describe a new method for inferring and representing alternative (reticulate) evolutionary histories of species as an explicit weighted consensus network which can be constructed from a collection of gene trees with or without prior knowledge of the species phylogeny. RESULTS: We provide a way of building a weighted phylogenetic network for each of the following reticulation mechanisms: diploid hybridization, intragenic recombination and complete or partial horizontal gene transfer. We successfully tested our method on some synthetic and real datasets to infer the above-mentioned evolutionary events which may have influenced the evolution of many species. CONCLUSIONS: Our weighted consensus network inference method allows one to infer, visualize and validate statistically major conflicting signals induced by the mechanisms of reticulate evolution. The results provided by the new method can be used to represent the inferred conflicting signals by means of explicit and easy-to-interpret phylogenetic networks.


Asunto(s)
Algoritmos , Evolución Biológica , Filogenia , Evolución Molecular , Transferencia de Gen Horizontal , Hibridación Genética
16.
Bioinformatics ; 28(13): 1775-82, 2012 Jul 01.
Artículo en Inglés | MEDLINE | ID: mdl-22563067

RESUMEN

MOTIVATION: Rapid advances in biomedical sciences and genetics have increased the pressure on drug development companies to promptly translate new knowledge into treatments for disease. Impelled by the demand and facilitated by technological progress, the number of compounds evaluated during the initial high-throughput screening (HTS) step of drug discovery process has steadily increased. As a highly automated large-scale process, HTS is prone to systematic error caused by various technological and environmental factors. A number of error correction methods have been designed to reduce the effect of systematic error in experimental HTS (Brideau et al., 2003; Carralot et al., 2012; Kevorkov and Makarenkov, 2005; Makarenkov et al., 2007; Malo et al., 2010). Despite their power to correct systematic error when it is present, the applicability of those methods in practice is limited by the fact that they can potentially introduce a bias when applied to unbiased data. We describe two new methods for eliminating systematic error from HTS data based on a prior knowledge of the error location. This information can be obtained using a specific version of the t-test or of the χ(2) goodness-of-fit test as discussed in Dragiev et al. (2011). We will show that both new methods constitute an important improvement over the standard practice of not correcting for systematic error at all as well as over the B-score correction procedure (Brideau et al., 2003) which is widely used in the modern HTS. We will also suggest a more general data preprocessing framework where the new methods can be applied in combination with the Well Correction procedure (Makarenkov et al., 2007). Such a framework will allow for removing systematic biases affecting all plates of a given screen as well as those relative to some of its individual plates.


Asunto(s)
Ensayos Analíticos de Alto Rendimiento/métodos , Simulación por Computador , Descubrimiento de Drogas
17.
Nucleic Acids Res ; 39(21): e144, 2011 Nov.
Artículo en Inglés | MEDLINE | ID: mdl-21917854

RESUMEN

Many bacteria and viruses adapt to varying environmental conditions through the acquisition of mosaic genes. A mosaic gene is composed of alternating sequence polymorphisms either belonging to the host original allele or derived from the integrated donor DNA. Often, the integrated sequence contains a selectable genetic marker (e.g. marker allowing for antibiotic resistance). An effective identification of mosaic genes and detection of corresponding partial horizontal gene transfers (HGTs) are among the most important challenges posed by evolutionary biology. We developed a method for detecting partial HGT events and related intragenic recombination giving rise to the formation of mosaic genes. A bootstrap procedure incorporated in our method is used to assess the support of each predicted partial gene transfer. The proposed method can be also applied to confirm or discard complete (i.e. traditional) horizontal gene transfers detected by any HGT inferring method. While working on a full-genome scale, the new method can be used to assess the level of mosaicism in the considered genomes as well as the rates of complete and partial HGT underlying their evolution.


Asunto(s)
Transferencia de Gen Horizontal , Mosaicismo , Proteínas de Escherichia coli/genética , Método de Montecarlo , Filogenia , Recombinación Genética , Ribulosa-Bifosfato Carboxilasa/genética , Alineación de Secuencia
18.
Mol Phylogenet Evol ; 64(1): 190-7, 2012 Jul.
Artículo en Inglés | MEDLINE | ID: mdl-22491069

RESUMEN

Methods designed for inferring phylogenetic trees have been widely applied to reconstruct biogeographic history. Because traditional phylogenetic methods used in biogeographic reconstruction are based on trees rather than networks, they follow the strict assumption in which dispersal among geographical units have occurred on the basis of single dispersal routes across regions and are, therefore, incapable of modelling multiple alternative dispersal scenarios. The goal of this study is to describe a new method that allows for retracing species dispersal by means of directed phylogenetic networks obtained using a horizontal gene transfer (HGT) detection method as well as to draw parallels between the processes of HGT and biogeographic reconstruction. In our case study, we reconstructed the biogeographic history of the postglacial dispersal of freshwater fishes in the Ontario province of Canada. This case study demonstrated the utility and robustness of the new method, indicating that the most important events were south-to-north dispersal patterns, as one would expect, with secondary faunal interchange among regions. Finally, we showed how our method can be used to explore additional questions regarding the commonalities in dispersal history patterns and phylogenetic similarities among species.


Asunto(s)
Demografía , Peces/genética , Transferencia de Gen Horizontal/genética , Modelos Genéticos , Filogenia , Animales , Análisis por Conglomerados , Biología Computacional , Complejo IV de Transporte de Electrones/genética , Agua Dulce , Filogeografía , Quebec , Especificidad de la Especie
19.
Sci Rep ; 12(1): 179, 2022 01 07.
Artículo en Inglés | MEDLINE | ID: mdl-34996997

RESUMEN

Recent years have seen a steep rise in the number of skin cancer detection applications. While modern advances in deep learning made possible reaching new heights in terms of classification accuracy, no publicly available skin cancer detection software provide confidence estimates for these predictions. We present DUNEScan (Deep Uncertainty Estimation for Skin Cancer), a web server that performs an intuitive in-depth analysis of uncertainty in commonly used skin cancer classification models based on convolutional neural networks (CNNs). DUNEScan allows users to upload a skin lesion image, and quickly compares the mean and the variance estimates provided by a number of new and traditional CNN models. Moreover, our web server uses the Grad-CAM and UMAP algorithms to visualize the classification manifold for the user's input, hence providing crucial information about its closeness to skin lesion images  from the popular ISIC database. DUNEScan is freely available at: https://www.dunescan.org .


Asunto(s)
Aprendizaje Profundo , Diagnóstico por Computador , Interpretación de Imagen Asistida por Computador , Internet , Fotograbar , Neoplasias Cutáneas/patología , Técnicas de Apoyo para la Decisión , Humanos , Valor Predictivo de las Pruebas , Reproducibilidad de los Resultados , Neoplasias Cutáneas/clasificación , Incertidumbre
20.
Front Robot AI ; 9: 1076897, 2022.
Artículo en Inglés | MEDLINE | ID: mdl-36817004

RESUMEN

This paper introduces an optimal algorithm for solving the discrete grid-based coverage path planning (CPP) problem. This problem consists in finding a path that covers a given region completely. First, we propose a CPP-solving baseline algorithm based on the iterative deepening depth-first search (ID-DFS) approach. Then, we introduce two branch-and-bound strategies (Loop detection and an Admissible heuristic function) to improve the results of our baseline algorithm. We evaluate the performance of our planner using six types of benchmark grids considered in this study: Coast-like, Random links, Random walk, Simple-shapes, Labyrinth and Wide-Labyrinth grids. We are first to consider these types of grids in the context of CPP. All of them find their practical applications in real-world CPP problems from a variety of fields. The obtained results suggest that the proposed branch-and-bound algorithm solves the problem optimally (i.e., the exact solution is found in each case) orders of magnitude faster than an exhaustive search CPP planner. To the best of our knowledge, no general CPP-solving exact algorithms, apart from an exhaustive search planner, have been proposed in the literature.

SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA