RESUMO
Single-cell genome sequencing provides a highly granular view of biological systems but is affected by high error rates, allelic amplification bias, and uneven genome coverage. This creates a need for data-specific computational methods, for purposes such as for cell lineage tree inference. The objective of cell lineage tree reconstruction is to infer the evolutionary process that generated a set of observed cell genomes. Lineage trees may enable a better understanding of tumor formation and growth, as well as of organ development for healthy body cells. We describe a method, Scelestial, for lineage tree reconstruction from single-cell data, which is based on an approximation algorithm for the Steiner tree problem and is a generalization of the neighbor-joining method. We adapt the algorithm to efficiently select a limited subset of potential sequences as internal nodes, in the presence of missing values, and to minimize cost by lineage tree-based missing value imputation. In a comparison against seven state-of-the-art single-cell lineage tree reconstruction algorithms-BitPhylogeny, OncoNEM, SCITE, SiFit, SASC, SCIPhI, and SiCloneFit-on simulated and real single-cell tumor samples, Scelestial performed best at reconstructing trees in terms of accuracy and run time. Scelestial has been implemented in C++. It is also available as an R package named RScelestial.
Assuntos
Algoritmos , Neoplasias , Evolução Biológica , Linhagem da Célula/genética , Humanos , Modelos Genéticos , FilogeniaRESUMO
BACKGROUND: Genetic heterogeneity of a cancer tumor that develops during clonal evolution is one of the reasons for cancer treatment failure, by increasing the chance of drug resistance. Clones are cell populations with different genotypes, resulting from differences in somatic mutations that occur and accumulate during cancer development. An appropriate approach for identifying clones is determining the variant allele frequency of mutations that occurred in the tumor. Although bulk sequencing data can be used to provide that information, the frequencies are not informative enough for identifying different clones with the same prevalence and their evolutionary relationships. On the other hand, single-cell sequencing data provides valuable information about branching events in the evolution of a cancerous tumor. However, the temporal order of mutations may be determined with ambiguities using only single-cell data, while variant allele frequencies from bulk sequencing data can provide beneficial information for inferring the temporal order of mutations with fewer ambiguities. RESULT: In this study, a new method called Conifer (ClONal tree Inference For hEterogeneity of tumoR) is proposed which combines aggregated variant allele frequency from bulk sequencing data with branching event information from single-cell sequencing data to more accurately identify clones and their evolutionary relationships. It is proven that the accuracy of clone identification and clonal tree inference is increased by using Conifer compared to other existing methods on various sets of simulated data. In addition, it is discussed that the evolutionary tree provided by Conifer on real cancer data sets is highly consistent with information in both bulk and single-cell data. CONCLUSIONS: In this study, we have provided an accurate and robust method to identify clones of tumor heterogeneity and their evolutionary history by combining single-cell and bulk sequencing data.
Assuntos
Neoplasias , Traqueófitas , Evolução Clonal , Genótipo , Humanos , Mutação , Neoplasias/genética , Análise de Célula ÚnicaRESUMO
In recent studies, non-coding protein RNAs have been identified as microRNA that can be used as biomarkers for early diagnosis and treatment of cancer, that decrease mortality in cancer. A microRNA may target hundreds or thousands of genes and a gene may regulate several microRNAs, so determining which microRNA is associated with which cancer is a big challenge. Many computational methods have been performed to detect micoRNAs association with cancer, but more effort is needed with higher accuracy. Increasing research has shown that relationship between microRNAs and TFs play a significant role in the diagnosis of cancer. Therefore, we developed a new computational framework (CAMIRADA) to identify cancer-related microRNAs based on the relationship between microRNAs and disease genes (DG) in the protein network, the functional relationships between microRNAs and Transcription Factors (TF) on the co-expression network, and the relationship between microRNAs and the Differential Expression Gene (DEG) on co-expression network. The CAMIRADA was applied to assess breast cancer data from two HMDD and miR2Disease databases. In this study, the AUC for the 65 microRNAs of the top of the list was 0.95, which was more accurate than the similar methods used to detect microRNAs associated with the cancer artery.
Assuntos
Algoritmos , Biomarcadores Tumorais/metabolismo , Neoplasias da Mama/genética , MicroRNAs/genética , Feminino , Redes Reguladoras de Genes , HumanosRESUMO
In the past few years, many researches have been conducted on identifying and prioritizing disease-related genes with the goal of achieving significant improvements in treatment and drug discovery. Both experimental and computational approaches have been exploited in recent studies to explore disease-susceptible genes. The experimental methods for identification of these genes are usually time-consuming and expensive. As a result, a substantial number of these studies have shown interest in utilizing computational techniques, commonly known as gene prioritization methods. From a conceptual point of view, these methods combine various sources of information about a particular disease of interest and then use it to discover and prioritize candidate disease genes. In this paper, we propose a gene prioritization method (HybridRanker), which exploits network topological features, as well as several biomedical data sources to identify candidate disease genes. In this approach, the genes are characterized using both local and global features of a protein-protein interaction (PPI) network. Furthermore, to obtain improved results for a particular disease of interest, HybridRanker incorporates data from diseases with similar symptoms and also from its comorbid diseases. We applied this new approach to identify and prioritize candidate disease genes of colorectal cancer (CRC) and the efficiency of HybridRanker was confirmed by leave-one-out cross-validation test. Moreover, in comparison with several well-known prioritization methods, HybridRanker shows higher performance in terms of different criteria.
Assuntos
Biologia Computacional , Bases de Dados Genéticas , Neoplasias/genética , Mapas de Interação de Proteínas , Algoritmos , Estudos de Associação Genética , Predisposição Genética para Doença , Humanos , Estatística como AssuntoRESUMO
During the early stages of the SARS-CoV-2 pandemic, before vaccines were available, nonpharmaceutical interventions (NPIs) such as reducing contacts or antigenic testing were used to control viral spread. Quantifying their success is therefore key for future pandemic preparedness. Using 1.8 million SARS-CoV-2 genomes from systematic surveillance, we study viral lineage importations into Germany for the third pandemic wave from late 2020 to early 2021, using large-scale Bayesian phylogenetic and phylogeographic analysis with a longitudinal assessment of lineage importation dynamics over multiple sampling strategies. All major nationwide NPIs were followed by fewer importations, with the strongest decreases seen for free rapid tests, the strengthening of regulations on mask-wearing in public transport and stores, as well as on internal movements and gatherings. Most SARS-CoV-2 lineages first appeared in the three most populous states with most cases, and spread from there within the country. Importations rose before and peaked shortly after the Christmas holidays. The substantial effects of free rapid tests and obligatory medical/surgical mask-wearing suggests these as key for pandemic preparedness, given their relatively few negative socioeconomic effects. The approach relates environmental factors at the host population level to viral lineage dissemination, facilitating similar analyses of rapidly evolving pathogens in the future.
Assuntos
COVID-19 , Filogenia , Filogeografia , SARS-CoV-2 , Humanos , COVID-19/epidemiologia , COVID-19/virologia , COVID-19/prevenção & controle , COVID-19/transmissão , SARS-CoV-2/genética , SARS-CoV-2/classificação , Alemanha/epidemiologia , Teorema de Bayes , Genoma Viral/genética , Pandemias/prevenção & controleRESUMO
In this paper, an optical solution for the dominating set problem is provided. The solution is based on long ribbon-shaped optical filters, on which some operations can be optically applied efficiently. The provided solution requires polynomial time, exponential length of filters, and exponential number of photons to solve the dominating set problem. The provided solution is implemented experimentally using lithographic sheets, on a graph with six vertices, to find all dominating sets with two vertices.
RESUMO
Cytotoxic T-lymphocyte-associated antigen 4 (CTLA-4) and programmed cell death protein 1 (PD-1), two clinically relevant targets for the immunotherapy of cancer, are negative regulators of T-cell activation and migration. Optimizing the therapeutic response to CTLA-4 and PD-1 blockade calls for a more comprehensive insight into the coordinated function of these immune regulators. Mathematical modeling can be used to elucidate nonlinear tumor-immune interactions and highlight the underlying mechanisms to tackle the problem. Here, we investigated and statistically characterized the dynamics of T-cell migration as a measure of the functional response to these pathways. We used a previously developed three-dimensional organotypic culture of patient-derived tumor spheroids treated with anti-CTLA-4 and anti-PD-1 antibodies for this purpose. Experiment-based dynamical modeling revealed the delayed kinetics of PD-1 activation, which originates from the distinct characteristics of PD-1 and CTLA-4 regulation, and followed through with the modification of their contributions to immune modulation. The simulation results show good agreement with the tumor cell reduction and active immune cell count in each experiment. Our findings demonstrate that while PD-1 activation provokes a more exhaustive intracellular cascade within a mature tumor environment, the time-delayed kinetics of PD-1 activation outweighs its preeminence at the individual cell level and consequently confers a functional dominance to the CTLA-4 checkpoint. The proposed model explains the distinct immunostimulatory pattern of PD-1 and CTLA-4 blockade based on mechanisms involved in the regulation of their expression and may be useful for planning effective treatment schemes targeting PD-1 and CTLA-4 functions.
Assuntos
Inibidores de Checkpoint Imunológico , Neoplasias , Humanos , Antígeno CTLA-4/metabolismo , Linfócitos T/metabolismo , Imunoterapia/métodos , Abatacepte , Neoplasias/patologiaRESUMO
Background: Patterns on proteins and genomic sequences are vastly analyzed, extracted and collected in databases. Although protein patterns originate from genomic coding regions, very few works have directly or indirectly dealt with coding region patterns induced from protein patterns. Results: In this paper, we have defined a new genomic pattern structure suitable for representing induced patterns from proteins. The provided pattern structure, which is called "Consecutive Positions Scoring Matrix (CPSSM)", is a replacement for protein patterns and profiles in the genomic context. CPSSMs can be identified, discovered, and searched in genomes. Then, we have presented a novel pattern matching algorithm between the defined genomic pattern and genomic sequences based on dynamic programming. In addition, we have modified the provided algorithm to support intronic gaps and huge sequences. We have implemented and tested the provided algorithm on real data. The results on Saccharomyces cerevisiae's genome show 132% more true positives and no false negatives and the results on human genome show no false negatives and 10 times as many true positives as those in previous works. Conclusion: CPSSM and provided methods could be used for open reading frame detection and gene finding. The application is available with source codes to run and download at http://app.foroughmand.ir/cpssm/.
Assuntos
Algoritmos , Motivos de Aminoácidos/genética , Biologia Computacional/métodos , Genoma , Genoma Humano , Humanos , Fases de Leitura AbertaRESUMO
Studying and understanding human brain structures and functions have become one of the most challenging issues in neuroscience today. However, the mammalian nervous system is made up of hundreds of millions of neurons and billions of synapses. This complexity made it impossible to reconstruct such a huge nervous system in the laboratory. So, most researchers focus on C. elegans neural network. The C. elegans neural network is the only biological neural network that is fully mapped. This nervous system is the simplest neural network that exists. However, many fundamental behaviors like movement emerge from this basic network. These features made C. elegans a convenient case to study the nervous systems. Many studies try to propose a network formation model for C. elegans neural network. However, these studies could not meet all characteristics of C. elegans neural network, such as significant factors that play a role in the formation of C. elegans neural network. Thus, new models are needed to be proposed in order to explain all aspects of C. elegans neural network. In this paper, a new model based on game theory is proposed in order to understand the factors affecting the formation of nervous systems, which meet the C. elegans frontal neural network characteristics. In this model, neurons are considered to be agents. The strategy for each neuron includes either making or removing links to other neurons. After choosing the basic network, the utility function is built using structural and functional factors. In order to find the coefficients for each of these factors, linear programming is used. Finally, the output network is compared with C. elegans frontal neural network and previous models. The results implicate that the game-theoretical model proposed in this paper can better predict the influencing factors in the formation of C. elegans neural network compared to previous models.
RESUMO
Control problem in a biological system is the problem of finding an interventional policy for changing the state of the biological system from an undesirable state, e.g. disease, into a desirable healthy state. Boolean networks are utilized as a mathematical model for gene regulatory networks. This paper provides an algorithm to solve the control problem in Boolean networks. The proposed algorithm is implemented and applied on two biological systems: T-cell receptor network and Drosophila melanogaster network. Results show that the proposed algorithm works faster in solving the control problem over these networks, while having similar accuracy, in comparison to previous exact methods. Source code and a simple web service of the proposed algorithm is available at http://goliaei.ir/net-control/www/.
Assuntos
Algoritmos , Animais , Drosophila melanogaster/genética , Redes Reguladoras de Genes , Modelos Teóricos , Receptores de Antígenos de Linfócitos T/genética , Receptores de Antígenos de Linfócitos T/metabolismoRESUMO
Based on previous studies, empirical distribution of the bacterial burst size varies even in a population of isogenic bacteria. Since bacteriophage progenies increase linearly with time, it is the lysis time variation that results in the bacterial burst size variations. Here, the burst size variation is computationally modeled by considering the lysis time decisions as a game. Each player in the game is a bacteriophage that has initially infected and lysed its host bacterium. Also, the payoff of each burst size strategy is the average number of bacteria that are solely infected by the bacteriophage progenies after lysis. For calculating the payoffs, a new version of ball and bin model with time dependent occupation probabilities (TDOP) is proposed. We show that Nash equilibrium occurs for a range of mixed burst size strategies that are chosen and played by bacteriophages, stochastically. Moreover, it is concluded that the burst size variations arise from choosing mixed lysis strategies by each player. By choosing the lysis time and also the burst size stochastically, the released bacteriophage progenies infect a portion of host bacteria in environment and avoid extinction. The probability distribution of the mixed burst size strategies is also identified.
Assuntos
Bactérias/virologia , Bacteriólise/fisiologia , Modelos Biológicos , Modelos Estatísticos , Bactérias/citologia , Fenômenos Fisiológicos Bacterianos , Bacteriófagos , Teoria dos JogosRESUMO
Although it is known that synonymous codons are not chosen randomly, the role of the codon usage in gene regulation is not clearly understood, yet. Researchers have investigated the relation between the codon usage and various properties, such as gene regulation, translation rate, translation efficiency, mRNA stability, splicing, and protein domains. Recently, a universal codon usage based mechanism for gene regulation is proposed. We studied the role of protein sequence patterns on the codons usage by related genes. Considering a subsequence of a protein that matches to a pattern or motif, we showed that, parts of the genes, which are translated to this subsequence, use specific ratios of synonymous codons. Also, we built a multinomial logistic regression statistical model for codon usage, which considers the effect of patterns on codon usage. This model justifies the observed codon usage preference better than the classic organism dependent codon usage. Our results showed that the codon usage plays a role in controlling protein levels, for genes that participate in a specific biological function. This is the first time that this phenomenon is reported.