RESUMEN
Researchers are becoming more interested in novel barium-nitride-chloride (Ba3NCl3) hybrid perovskite solar cells (HPSCs) due to their remarkable semiconductor properties. An electron transport layer (ETL) built from TiO2 and a hole transport layer (HTL) made of CuI have been studied in Ba3NCl3-based single junction photovoltaic cells in a variety of variations. Through extensive numerical analysis using SCAPS-1D simulation software, we investigated elements such as layer thickness, defect density, doping concentration, interface defect density, carrier concentration, generation, recombination, temperature, series and shunt resistance, open circuit voltage (V OC), short circuit current (J SC), fill factor (FF), and power conversion efficiency (PCE). The study found that the HTL CuI design reached the highest PCE at 30.47% with a V OC of 1.0649 V, a J SC of 38.2609 mA cm-2, and an FF of 74.78%. These findings offer useful data and a practical plan for producing inexpensive, Ba3NCl3-based thin-film solar cells.
RESUMEN
Large tree structures are ubiquitous and real-world relational datasets often have information associated with nodes (e.g., labels or other attributes) and edges (e.g., weights or distances) that need to be communicated to the viewers. Yet, scalable, easy to read tree layouts are difficult to achieve. We consider tree layouts to be readable if they meet some basic requirements: node labels should not overlap, edges should not cross, edge lengths should be preserved, and the output should be compact. There are many algorithms for drawing trees, although very few take node labels or edge lengths into account, and none optimizes all requirements above. With this in mind, we propose a new scalable method for readable tree layouts. The algorithm guarantees that the layout has no edge crossings and no label overlaps, and optimizes one of the remaining aspects: desired edge lengths and compactness. We evaluate the performance of the new algorithm by comparison with related earlier approaches using several real-world datasets, ranging from a few thousand nodes to hundreds of thousands of nodes. Tree layout algorithms can be used to visualize large general graphs, by extracting a hierarchy of progressively larger trees. We illustrate this functionality by presenting several map-like visualizations generated by the new tree layout algorithm.
RESUMEN
Smartphone cameras and digital devices are increasingly used in the capture of tick images by the public as citizen scientists, and rapid advances in deep learning and computer vision has enabled brand new image recognition models to be trained. However, there is currently no web-based or mobile application that supports automated classification of tick images. The purpose of this study was to compare the accuracy of a deep learning model pre-trained with millions of annotated images in Imagenet, against a shallow custom-build convolutional neural network (CNN) model for the classification of common hard ticks present in anthropic areas from northeastern USA. We created a dataset of approximately 2000 images of four tick species (Ixodes scapularis, Dermacentor variabilis, Amblyomma americanum and Haemaphysalis sp.), two sexes (male, female) and two life stages (adult, nymph). We used these tick images to train two separate CNN models - ResNet-50 and a simple shallow custom-built. We evaluated our models' performance on an independent subset of tick images not seen during training. Compared to the ResNet-50 model, the small shallow custom-built model had higher training (99.7%) and validation (99.1%) accuracies. When tested with new tick image data, the shallow custom-built model yielded higher mean prediction accuracy (80%), greater confidence of true detection (88.7%) and lower mean response time (3.64 s). These results demonstrate that, with limited data size for model training, a simple shallow custom-built CNN model has great prospects for use in the classification of common hard ticks present in anthropic areas from northeastern USA.
Asunto(s)
Ixodes , Ixodidae , Amblyomma , Animales , Femenino , Masculino , Redes Neurales de la Computación , NinfaRESUMEN
Available computational methods for tumor phylogeny inference via single-cell sequencing (SCS) data typically aim to identify the most likely perfect phylogeny tree satisfying the infinite sites assumption (ISA). However, the limitations of SCS technologies including frequent allele dropout and variable sequence coverage may prohibit a perfect phylogeny. In addition, ISA violations are commonly observed in tumor phylogenies due to the loss of heterozygosity, deletions, and convergent evolution. In order to address such limitations, we introduce the optimal subperfect phylogeny problem which asks to integrate SCS data with matching bulk sequencing data by minimizing a linear combination of potential false negatives (due to allele dropout or variance in sequence coverage), false positives (due to read errors) among mutation calls, and the number of mutations that violate ISA (real or because of incorrect copy number estimation). We then describe a combinatorial formulation to solve this problem which ensures that several lineage constraints imposed by the use of variant allele frequencies (VAFs, derived from bulk sequence data) are satisfied. We express our formulation both in the form of an integer linear program (ILP) and-as a first in tumor phylogeny reconstruction-a Boolean constraint satisfaction problem (CSP) and solve them by leveraging state-of-the-art ILP/CSP solvers. The resulting method, which we name PhISCS, is the first to integrate SCS and bulk sequencing data while accounting for ISA violating mutations. In contrast to the alternative methods, typically based on probabilistic approaches, PhISCS provides a guarantee of optimality in reported solutions. Using simulated and real data sets, we demonstrate that PhISCS is more general and accurate than all available approaches.
Asunto(s)
Biología Computacional/métodos , Secuenciación de Nucleótidos de Alto Rendimiento/métodos , Neoplasias/genética , Filogenia , Análisis de la Célula Individual/métodos , Humanos , Neoplasias/patologíaRESUMEN
We introduce a new dissimilarity measure between a pair of "clonal trees", each representing the progression and mutational heterogeneity of a tumor sample, constructed by the use of single cell or bulk high throughput sequencing data. In a clonal tree, each vertex represents a specific tumor clone, and is labeled with one or more mutations in a way that each mutation is assigned to the oldest clone that harbors it. Given two clonal trees, our multi-labeled tree dissimilarity (MLTD) measure is defined as the minimum number of mutation/label deletions, (empty) leaf deletions, and vertex (clonal) expansions, applied in any order, to convert each of the two trees to the maximum common tree. We show that the MLTD measure can be computed efficiently in polynomial time and it captures the similarity between trees of different clonal granularity well.
RESUMEN
An antigen is a protein capable of triggering an effective immune system response. Protective antigens are the ones that can invoke specific and enhanced adaptive immune response to subsequent exposure to the specific pathogen or related organisms. Such proteins are therefore of immense importance in vaccine preparation and drug design. However, the laboratory experiments to isolate and identify antigens from a microbial pathogen are expensive, time consuming and often unsuccessful. This is why Reverse Vaccinology has become the modern trend of vaccine search, where computational methods are first applied to predict protective antigens or their determinants, known as epitopes. In this paper, we propose a novel, accurate computational model to identify protective antigens efficiently. Our model extracts features directly from the protein sequences, without any dependence on functional domain or structural information. After relevant features are extracted, we have used Random Forest algorithm to rank the features. Then Recursive Feature Elimination (RFE) and minimum redundancy maximum relevance (mRMR) criterion were applied to extract an optimal set of features. The learning model was trained using Random Forest algorithm. Named as Antigenic, our proposed model demonstrates superior performance compared to the state-of-the-art predictors on a benchmark dataset. Antigenic achieves accuracy, sensitivity and specificity values of 78.04%, 78.99% and 77.08% in 10-fold cross-validation testing respectively. In jackknife cross-validation, the corresponding scores are 80.03%, 80.90% and 79.16% respectively. The source code of Antigenic, along with relevant dataset and detailed experimental results, can be found at https://github.com/srautonu/AntigenPredictor. A publicly accessible web interface has also been established at: http://antigenic.research.buet.ac.bd.
Asunto(s)
Antígenos/análisis , Modelos Biológicos , Algoritmos , Aminoácidos/análisis , Antígenos/química , Biología Computacional/métodosRESUMEN
The Golgi Apparatus (GA) is a key organelle for protein synthesis within the eukaryotic cell. The main task of GA is to modify and sort proteins for transport throughout the cell. Proteins permeate through the GA on the ER (Endoplasmic Reticulum) facing side (cis side) and depart on the other side (trans side). Based on this phenomenon, we get two types of GA proteins, namely, cis-Golgi protein and trans-Golgi protein. Any dysfunction of GA proteins can result in congenital glycosylation disorders and some other forms of difficulties that may lead to neurodegenerative and inherited diseases like diabetes, cancer and cystic fibrosis. So, the exact classification of GA proteins may contribute to drug development which will further help in medication. In this paper, we focus on building a new computational model that not only introduces easy ways to extract features from protein sequences but also optimizes classification of trans-Golgi and cis-Golgi proteins. After feature extraction, we have employed Random Forest (RF) model to rank the features based on the importance score obtained from it. After selecting the top ranked features, we have applied Support Vector Machine (SVM) to classify the sub-Golgi proteins. We have trained regression model as well as classification model and found the former to be superior. The model shows improved performance over all previous methods. As the benchmark dataset is significantly imbalanced, we have applied Synthetic Minority Over-sampling Technique (SMOTE) to the dataset to make it balanced and have conducted experiments on both versions. Our method, namely, identification of sub-Golgi Protein Types (isGPT), achieves accuracy values of 95.4%, 95.9% and 95.3% for 10-fold cross-validation test, jackknife test and independent test respectively. According to different performance metrics, isGPT performs better than state-of-the-art techniques. The source code of isGPT, along with relevant dataset and detailed experimental results, can be found at https://github.com/srautonu/isGPT.
Asunto(s)
Biología Computacional/métodos , Aparato de Golgi/química , Oligopéptidos/análisis , Proteínas/análisis , Máquina de Vectores de Soporte , Secuencia de Aminoácidos , Animales , Bases de Datos de Proteínas , Humanos , Oligopéptidos/clasificación , Proteínas/clasificación , Reproducibilidad de los ResultadosRESUMEN
The CRISPR/Cas9-sgRNA system has recently become a popular tool for genome editing and a very hot topic in the field of medical research. In this system, Cas9 protein is directed to a desired location for gene engineering and cleaves target DNA sequence which is complementary to a 20-nucleotide guide sequence found within the sgRNA. A lot of experimental efforts, ranging from in vivo selection to in silico modeling, have been made for efficient designing of sgRNAs in CRISPR/Cas9 system. In this article, we present a novel tool, called CRISPRpred, for efficient in silico prediction of sgRNAs on-target activity which is based on the applications of Support Vector Machine (SVM) model. To conduct experiments, we have used a benchmark dataset of 17 genes and 5310 guide sequences where there are only 20% true values. CRISPRpred achieves Area Under Receiver Operating Characteristics Curve (AUROC-Curve), Area Under Precision Recall Curve (AUPR-Curve) and maximum Matthews Correlation Coefficient (MCC) as 0.85, 0.56 and 0.48, respectively. Our tool shows approximately 5% improvement in AUPR-Curve and after analyzing all evaluation metrics, we find that CRISPRpred is better than the current state-of-the-art. CRISPRpred is enough flexible to extract relevant features and use them in a learning algorithm. The source code of our entire software with relevant dataset can be found in the following link: https://github.com/khaled-buet/CRISPRpred.