RESUMEN
The prevalence of malignant cells in clinical specimens, or tumour purity, is affected by both intrinsic biological factors and extrinsic sampling bias. Molecular characterization of large clinical cohorts is typically performed on bulk samples; data analysis and interpretation can be biased by tumour purity variability. Transcription-based strategies to estimate tumour purity have been proposed, but no breast cancer specific method is available yet. We interrogated over 6000 expression profiles from 10 breast cancer datasets to develop and validate a 9-gene Breast Cancer Purity Score (BCPS). BCPS outperformed existing methods for estimating tumour content. Adjusting transcriptomic profiles using the BCPS reduces sampling bias and aids data interpretation. BCPS-estimated tumour purity improved prognostication in luminal breast cancer, correlated with pathologic complete response in on-treatment biopsies from triple-negative breast cancer patients undergoing neoadjuvant treatment and effectively stratified the risk of relapse in HER2+ residual disease post-neoadjuvant treatment.
RESUMEN
The diagnostic assessment of thyroid nodules is hampered by the persistence of uncertainty in borderline cases and further complicated by the inclusion of noninvasive follicular tumor with papillary-like nuclear features (NIFTP) as a less aggressive alternative to papillary thyroid carcinoma (PTC). In this setting, computational methods might facilitate the diagnostic process by unmasking key nuclear characteristics of NIFTP. The main aims of this work were to (1) identify morphometric features of NIFTP and PTC that are interpretable for the human eye and (2) develop a deep learning model for multiclass segmentation as a support tool to reduce diagnostic variability. Our findings confirmed that nuclei in NIFTP and PTC share multiple characteristics, setting them apart from hyperplastic nodules (HP). The morphometric analysis identified 15 features that can be translated into nuclear alterations readily understandable by pathologists, such as a remarkable internuclear homogeneity for HP in contrast to a major complexity in the chromatin texture of NIFTP and to the peculiar pattern of nuclear texture variability of PTC. A few NIFTP cases with available next-generation sequencing data were also analyzed to initially explore the impact of RAS-related mutations on nuclear morphometry. Finally, a pixel-based deep learning model was trained and tested on whole-slide images of NIFTP, PTC, and HP cases. The model, named NUTSHELL (NUclei from Thyroid tumors Segmentation to Highlight Encapsulated Low-malignant Lesions), successfully detected and classified the majority of nuclei in all whole-slide image tiles, showing comparable results with already well-established pathology nuclear scores. NUTSHELL provides an immediate overview of NIFTP areas and can be used to detect microfoci of PTC within extensive glandular samples or identify lymph node metastases. NUTSHELL can be run inside WSInfer with an easy rendering in QuPath, thus facilitating the democratization of digital pathology.
RESUMEN
Tumor recognition by T cells is essential for antitumor immunity. A comprehensive characterization of T cell diversity may be key to understanding the success of immunomodulatory drugs and failure of PD-1 blockade in tumors such as multiple myeloma (MM). Here, we use single-cell RNA and T cell receptor sequencing to characterize bone marrow T cells from healthy adults (n = 4) and patients with precursor (n = 8) and full-blown MM (n = 10). Large T cell clones from patients with MM expressed multiple immune checkpoints, suggesting a potentially dysfunctional phenotype. Dual targeting of PD-1 + LAG3 or PD-1 + TIGIT partially restored their function in mice with MM. We identify phenotypic hallmarks of large intratumoral T cell clones, and demonstrate that the CD27- and CD27+ T cell ratio, measured by flow cytometry, may serve as a surrogate of clonal T cell expansions and an independent prognostic factor in 543 patients with MM treated with lenalidomide-based treatment combinations.
Asunto(s)
Mieloma Múltiple , Adulto , Humanos , Animales , Ratones , Mieloma Múltiple/tratamiento farmacológico , Mieloma Múltiple/genética , Linfocitos T , Receptor de Muerte Celular Programada 1/genética , Lenalidomida , Células ClonalesRESUMEN
Introduction: Oxford Nanopore Technologies (ONT) is a third generation sequencing approach that allows the analysis of individual, full-length nucleic acids. ONT records the alterations of an ionic current flowing across a nano-scaled pore while a DNA or RNA strand is threading through the pore. Basecalling methods are then leveraged to translate the recorded signal back to the nucleic acid sequence. However, basecall generally introduces errors that hinder the process of barcode demultiplexing, a pivotal task in single-cell RNA sequencing that allows for separating the sequenced transcripts on the basis of their cell of origin. Methods: To solve this issue, we present a novel framework, called UNPLEX, designed to tackle the barcode demultiplexing problem by operating directly on the recorded signals. UNPLEX combines two unsupervised machine learning methods: autoencoders and self-organizing maps (SOM). The autoencoders extract compact, latent representations of the recorded signals that are then clustered by the SOM. Results and Discussion: Our results, obtained on two datasets composed of in silico generated ONT-like signals, show that UNPLEX represents a promising starting point for the development of effective tools to cluster the signals corresponding to the same cell.
RESUMEN
BACKGROUND AND OBJECTIVES: Myocardial infarction scar (MIS) assessment by cardiac magnetic resonance provides prognostic information and guides patients' clinical management. However, MIS segmentation is time-consuming and not performed routinely. This study presents a deep-learning-based computational workflow for the segmentation of left ventricular (LV) MIS, for the first time performed on state-of-the-art dark-blood late gadolinium enhancement (DB-LGE) images, and the computation of MIS transmurality and extent. METHODS: DB-LGE short-axis images of consecutive patients with myocardial infarction were acquired at 1.5T in two centres between Jan 1, 2019, and June 1, 2021. Two convolutional neural network (CNN) models based on the U-Net architecture were trained to sequentially segment the LV and MIS, by processing an incoming series of DB-LGE images. A 5-fold cross-validation was performed to assess the performance of the models. Model outputs were compared respectively with manual (LV endo- and epicardial border) and semi-automated (MIS, 4-Standard Deviation technique) ground truth to assess the accuracy of the segmentation. An automated post-processing and reporting tool was developed, computing MIS extent (expressed as relative infarcted mass) and transmurality. RESULTS: The dataset included 1355 DB-LGE short-axis images from 144 patients (MIS in 942 images). High performance (> 0.85) as measured by the Intersection over Union metric was obtained for both the LV and MIS segmentations on the training sets. The performance for both LV and MIS segmentations was 0.83 on the test sets. Compared to the 4-Standard Deviation segmentation technique, our system was five times quicker (<1 min versus 7 ± 3 min), and required minimal user interaction. CONCLUSIONS: Our solution successfully addresses different issues related to automatic MIS segmentation, including accuracy, time-effectiveness, and the automatic generation of a clinical report.
Asunto(s)
Aprendizaje Profundo , Infarto del Miocardio , Humanos , Medios de Contraste , Cicatriz/diagnóstico por imagen , Cicatriz/patología , Gadolinio , Imagen por Resonancia Magnética/métodos , Infarto del Miocardio/diagnóstico por imagen , Espectroscopía de Resonancia MagnéticaRESUMEN
Calcium homeostasis and signaling processes in Saccharomyces cerevisiae, as well as in any eukaryotic organism, depend on various transporters and channels located on both the plasma and intracellular membranes. The activity of these proteins is regulated by a number of feedback mechanisms that act through the calmodulin-calcineurin pathway. When exposed to hypotonic shock (HTS), yeast cells respond with an increased cytosolic calcium transient, which seems to be conditioned by the opening of stretch-activated channels. To better understand the role of each channel and transporter involved in the generation and recovery of the calcium transient-and of their feedback regulations-we defined and analyzed a mathematical model of the calcium signaling response to HTS in yeast cells. The model was validated by comparing the simulation outcomes with calcium concentration variations before and during the HTS response, which were observed experimentally in both wild-type and mutant strains. Our results show that calcium normally enters the cell through the High Affinity Calcium influx System and mechanosensitive channels. The increase of the plasma membrane tension, caused by HTS, boosts the opening probability of mechanosensitive channels. This event causes a sudden calcium pulse that is rapidly dissipated by the activity of the vacuolar transporter Pmc1. According to model simulations, the role of another vacuolar transporter, Vcx1, is instead marginal, unless calcineurin is inhibited or removed. Our results also suggest that the mechanosensitive channels are subject to a calcium-dependent feedback inhibition, possibly involving calmodulin. Noteworthy, the model predictions are in accordance with literature results concerning some aspects of calcium homeostasis and signaling that were not specifically addressed within the model itself, suggesting that it actually depicts all the main cellular components and interactions that constitute the HTS calcium pathway, and thus can correctly reproduce the shaping of the calcium signature by calmodulin- and calcineurin-dependent complex regulations. The model predictions also allowed to provide an interpretation of different regulatory schemes involved in calcium handling in both wild-type and mutants yeast strains. The model could be easily extended to represent different calcium signals in other eukaryotic cells.
RESUMEN
Mathematical models of biochemical networks can largely facilitate the comprehension of the mechanisms at the basis of cellular processes, as well as the formulation of hypotheses that can be tested by means of targeted laboratory experiments. However, two issues might hamper the achievement of fruitful outcomes. On the one hand, detailed mechanistic models can involve hundreds or thousands of molecular species and their intermediate complexes, as well as hundreds or thousands of chemical reactions, a situation generally occurring in rule-based modeling. On the other hand, the computational analysis of a model typically requires the execution of a large number of simulations for its calibration, or to test the effect of perturbations. As a consequence, the computational capabilities of modern Central Processing Units can be easily overtaken, possibly making the modeling of biochemical networks a worthless or ineffective effort. To the aim of overcoming the limitations of the current state-of-the-art simulation approaches, we present in this paper FiCoS, a novel "black-box" deterministic simulator that effectively realizes both a fine-grained and a coarse-grained parallelization on Graphics Processing Units. In particular, FiCoS exploits two different integration methods, namely, the Dormand-Prince and the Radau IIA, to efficiently solve both non-stiff and stiff systems of coupled Ordinary Differential Equations. We tested the performance of FiCoS against different deterministic simulators, by considering models of increasing size and by running analyses with increasing computational demands. FiCoS was able to dramatically speedup the computations up to 855×, showing to be a promising solution for the simulation and analysis of large-scale models of complex biological processes.
Asunto(s)
Fenómenos Bioquímicos , Programas Informáticos , Biología de Sistemas , Algoritmos , Autofagia , Biología Computacional , Gráficos por Computador , Simulación por Computador , Humanos , Conceptos Matemáticos , Redes y Vías Metabólicas , Modelos Biológicos , Biosíntesis de Proteínas , Biología SintéticaRESUMEN
BACKGROUND: Single-cell RNA sequencing (scRNA-Seq) experiments are gaining ground to study the molecular processes that drive normal development as well as the onset of different pathologies. Finding an effective and efficient low-dimensional representation of the data is one of the most important steps in the downstream analysis of scRNA-Seq data, as it could provide a better identification of known or putatively novel cell-types. Another step that still poses a challenge is the integration of different scRNA-Seq datasets. Though standard computational pipelines to gain knowledge from scRNA-Seq data exist, a further improvement could be achieved by means of machine learning approaches. RESULTS: Autoencoders (AEs) have been effectively used to capture the non-linearities among gene interactions of scRNA-Seq data, so that the deployment of AE-based tools might represent the way forward in this context. We introduce here scAEspy, a unifying tool that embodies: (1) four of the most advanced AEs, (2) two novel AEs that we developed on purpose, (3) different loss functions. We show that scAEspy can be coupled with various batch-effect removal tools to integrate data by different scRNA-Seq platforms, in order to better identify the cell-types. We benchmarked scAEspy against the most used batch-effect removal tools, showing that our AE-based strategies outperform the existing solutions. CONCLUSIONS: scAEspy is a user-friendly tool that enables using the most recent and promising AEs to analyse scRNA-Seq data by only setting up two user-defined parameters. Thanks to its modularity, scAEspy can be easily extended to accommodate new AEs to further improve the downstream analysis of scRNA-Seq data. Considering the relevant results we achieved, scAEspy can be considered as a starting point to build a more comprehensive toolkit designed to integrate multi single-cell omics.
Asunto(s)
ARN , Análisis de la Célula Individual , Aprendizaje Automático , ARN/genética , Análisis de Secuencia de ARN , Secuenciación del ExomaRESUMEN
Combination therapies proved to be a valuable strategy in the fight against cancer, thanks to their increased efficacy in inducing tumor cell death and in reducing tumor growth, metastatic potential, and the risk of developing drug resistance. The identification of effective combinations of drug targets generally relies on costly and time consuming processes based on in vitro experiments. Here, we present a novel computational approach that, by integrating dynamic fuzzy modeling with multi-objective optimization, allows to efficiently identify novel combination cancer therapies, with a relevant saving in working time and costs. We tested this approach on a model of oncogenic K-ras cancer cells characterized by a marked Warburg effect. The computational approach was validated by its capability in finding out therapies already known in the literature for this type of cancer cell. More importantly, our results show that this method can suggest potential therapies consisting in a small number of molecular targets. In the model of oncogenic K-ras cancer cells, for instance, we identified combination of up to three targets, which affect different cellular pathways that are crucial for cancer proliferation and survival.
RESUMEN
Self-assembling processes are ubiquitous phenomena that drive the organization and the hierarchical formation of complex molecular systems. The investigation of assembling dynamics, emerging from the interactions among biomolecules like amino-acids and polypeptides, is fundamental to determine how a mixture of simple objects can yield a complex structure at the nano-scale level. In this paper we present HyperBeta, a novel open-source software that exploits an innovative algorithm based on hyper-graphs to efficiently identify and graphically represent the dynamics of [Formula: see text]-sheets formation. Differently from the existing tools, HyperBeta directly manipulates data generated by means of coarse-grained molecular dynamics simulation tools (GROMACS), performed using the MARTINI force field. Coarse-grained molecular structures are visualized using HyperBeta 's proprietary real-time high-quality 3D engine, which provides a plethora of analysis tools and statistical information, controlled by means of an intuitive event-based graphical user interface. The high-quality renderer relies on a variety of visual cues to improve the readability and interpretability of distance and depth relationships between peptides. We show that HyperBeta is able to track the [Formula: see text]-sheets formation in coarse-grained molecular dynamics simulations, and provides a completely new and efficient mean for the investigation of the kinetics of these nano-structures. HyperBeta will therefore facilitate biotechnological and medical research where these structural elements play a crucial role, such as the development of novel high-performance biomaterials in tissue engineering, or a better comprehension of the molecular mechanisms at the basis of complex pathologies like Alzheimer's disease.
Asunto(s)
Péptidos/química , Proteínas/química , Programas Informáticos , Estructura MolecularRESUMEN
Ras oncoproteins play a crucial role in the onset, maintenance, and progression of the most common and deadly human cancers. Despite extensive research efforts, only a few mutant-specific Ras inhibitors have been reported. We show that cmp4-previously identified as a water-soluble Ras inhibitor- targets multiple steps in the activation and downstream signaling of different Ras mutants and isoforms. Binding of this pan-Ras inhibitor to an extended Switch II pocket on HRas and KRas proteins induces a conformational change that down-regulates intrinsic and GEF-mediated nucleotide dissociation and exchange and effector binding. A mathematical model of the Ras activation cycle predicts that the inhibitor severely reduces the proliferation of different Ras-driven cancer cells, effectively cooperating with Cetuximab to reduce proliferation even of Cetuximab-resistant cancer cell lines. Experimental data confirm the model prediction, indicating that the pan-Ras inhibitor is an appropriate candidate for medicinal chemistry efforts tailored at improving its currently unsatisfactory affinity.
RESUMEN
Acute myeloid leukemia (AML) is a highly frequent hematological malignancy, characterized by clinical and biological diversity, along with high relapse and mortality rates. The inherent functional and genetic intra-tumor heterogeneity in AML is thought to play an important role in disease recurrence and resistance to chemotherapy. Patient-derived xenograft (PDX) models preserve important features of the original tumor, allowing, at the same time, experimental manipulation and in vivo amplification of the human cells. Here we present a detailed protocol for the generation of fluorescently labeled AML PDX models to monitor cell proliferation kinetics in vivo, at the single-cell level. Although experimental protocols for cell proliferation studies are well established and widespread, they are not easily applicable to in vivo contexts, and the analysis of related time-series data is often complex to achieve. To overcome these limitations, model-driven approaches can be exploited to investigate different aspects of cell population dynamics. Among the existing approaches, the ProCell framework is able to perform detailed and accurate stochastic simulations of cell proliferation, relying on flow cytometry data. In particular, by providing an initial and a target fluorescence histogram, ProCell automatically assesses the validity of any user-defined scenario of intra-tumor heterogeneity, that is, it is able to infer the proportion of various cell subpopulations (including quiescent cells) and the division interval of proliferating cells. Here we explain the protocol in detail, providing a description of our methodology for the conditional expression of H2B-GFP in human AML xenografts, data processing by flow cytometry, and the final elaboration in ProCell.
Asunto(s)
Simulación por Computador , Leucemia Mieloide Aguda , Trasplante de Neoplasias , Neoplasias Experimentales , Animales , Femenino , Xenoinjertos , Humanos , Leucemia Mieloide Aguda/metabolismo , Leucemia Mieloide Aguda/patología , Masculino , Ratones , Ratones Endogámicos NOD , Neoplasias Experimentales/metabolismo , Neoplasias Experimentales/patologíaRESUMEN
Surfing in rough waters is not always as fun as wave riding the "big one". Similarly, in optimization problems, fitness landscapes with a huge number of local optima make the search for the global optimum a hard and generally annoying game. Computational Intelligence optimization metaheuristics use a set of individuals that "surf" across the fitness landscape, sharing and exploiting pieces of information about local fitness values in a joint effort to find out the global optimum. In this context, we designed surF, a novel surrogate modeling technique that leverages the discrete Fourier transform to generate a smoother, and possibly easier to explore, fitness landscape. The rationale behind this idea is that filtering out the high frequencies of the fitness function and keeping only its partial information (i.e., the low frequencies) can actually be beneficial in the optimization process. We prove our theory by combining surF with a settings free variant of Particle Swarm Optimization (PSO) based on Fuzzy Logic, called Fuzzy Self-Tuning PSO. Specifically, we introduce a new algorithm, named F3ST-PSO, which performs a preliminary exploration on the surrogate model followed by a second optimization using the actual fitness function. We show that F3ST-PSO can lead to improved performances, notably using the same budget of fitness evaluations.
RESUMEN
The investigation of cell proliferation can provide useful insights for the comprehension of cancer progression, resistance to chemotherapy and relapse. To this aim, computational methods and experimental measurements based on in vivo label-retaining assays can be coupled to explore the dynamic behavior of tumoral cells. ProCell is a software that exploits flow cytometry data to model and simulate the kinetics of fluorescence loss that is due to stochastic events of cell division. Since the rate of cell division is not known, ProCell embeds a calibration process that might require thousands of stochastic simulations to properly infer the parameterization of cell proliferation models. To mitigate the high computational costs, in this paper we introduce a parallel implementation of ProCell's simulation algorithm, named cuProCell, which leverages Graphics Processing Units (GPUs). Dynamic Parallelism was used to efficiently manage the cell duplication events, in a radically different way with respect to common computing architectures. We present the advantages of cuProCell for the analysis of different models of cell proliferation in Acute Myeloid Leukemia (AML), using data collected from the spleen of human xenografts in mice. We show that, by exploiting GPUs, our method is able to not only automatically infer the models' parameterization, but it is also 237× faster than the sequential implementation. This study highlights the presence of a relevant percentage of quiescent and potentially chemoresistant cells in AML in vivo, and suggests that maintaining a dynamic equilibrium among the different proliferating cell populations might play an important role in disease progression.
Asunto(s)
Algoritmos , Gráficos por Computador , Animales , Proliferación Celular , Simulación por Computador , Citometría de Flujo , Ratones , Programas InformáticosRESUMEN
MOTIVATION: The elucidation of dysfunctional cellular processes that can induce the onset of a disease is a challenging issue from both the experimental and computational perspectives. Here we introduce a novel computational method based on the coupling between fuzzy logic modeling and a global optimization algorithm, whose aims are to (1) predict the emergent dynamical behaviors of highly heterogeneous systems in unperturbed and perturbed conditions, regardless of the availability of quantitative parameters, and (2) determine a minimal set of system components whose perturbation can lead to a desired system response, therefore facilitating the design of a more appropriate experimental strategy. RESULTS: We applied this method to investigate what drives K-ras-induced cancer cells, displaying the typical Warburg effect, to death or survival upon progressive glucose depletion. The optimization analysis allowed to identify new combinations of stimuli that maximize pro-apoptotic processes. Namely, our results provide different evidences of an important protective role for protein kinase A in cancer cells under several cellular stress conditions mimicking tumor behavior. The predictive power of this method could facilitate the assessment of the response of other complex heterogeneous systems to drugs or mutations in fields as medicine and pharmacology, therefore paving the way for the development of novel therapeutic treatments. AVAILABILITY AND IMPLEMENTATION: The source code of FUMOSO is available under the GPL 2.0 license on GitHub at the following URL: https://github.com/aresio/FUMOSO. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
Asunto(s)
Neoplasias , Programas Informáticos , Algoritmos , Humanos , MutaciónRESUMEN
Advances in microscopy imaging technologies have enabled the visualization of live-cell dynamic processes using time-lapse microscopy imaging. However, modern methods exhibit several limitations related to the training phases and to time constraints, hindering their application in the laboratory practice. In this work, we present a novel method, named Automated Cell Detection and Counting (ACDC), designed for activity detection of fluorescent labeled cell nuclei in time-lapse microscopy. ACDC overcomes the limitations of the literature methods, by first applying bilateral filtering on the original image to smooth the input cell images while preserving edge sharpness, and then by exploiting the watershed transform and morphological filtering. Moreover, ACDC represents a feasible solution for the laboratory practice, as it can leverage multi-core architectures in computer clusters to efficiently handle large-scale imaging datasets. Indeed, our Parent-Workers implementation of ACDC allows to obtain up to a 3.7× speed-up compared to the sequential counterpart. ACDC was tested on two distinct cell imaging datasets to assess its accuracy and effectiveness on images with different characteristics. We achieved an accurate cell-count and nuclei segmentation without relying on large-scale annotated datasets, a result confirmed by the average Dice Similarity Coefficients of 76.84 and 88.64 and the Pearson coefficients of 0.99 and 0.96, calculated against the manual cell counting, on the two tested datasets.
RESUMEN
BACKGROUND AND OBJECTIVES: Image segmentation represents one of the most challenging issues in medical image analysis to distinguish among different adjacent tissues in a body part. In this context, appropriate image pre-processing tools can improve the result accuracy achieved by computer-assisted segmentation methods. Taking into consideration images with a bimodal intensity distribution, image binarization can be used to classify the input pictorial data into two classes, given a threshold intensity value. Unfortunately, adaptive thresholding techniques for two-class segmentation work properly only for images characterized by bimodal histograms. We aim at overcoming these limitations and automatically determining a suitable optimal threshold for bimodal Magnetic Resonance (MR) images, by designing an intelligent image analysis framework tailored to effectively assist the physicians during their decision-making tasks. METHODS: In this work, we present a novel evolutionary framework for image enhancement, automatic global thresholding, and segmentation, which is here applied to different clinical scenarios involving bimodal MR image analysis: (i) uterine fibroid segmentation in MR guided Focused Ultrasound Surgery, and (ii) brain metastatic cancer segmentation in neuro-radiosurgery therapy. Our framework exploits MedGA as a pre-processing stage. MedGA is an image enhancement method based on Genetic Algorithms that improves the threshold selection, obtained by the efficient Iterative Optimal Threshold Selection algorithm, between the underlying sub-distributions in a nearly bimodal histogram. RESULTS: The results achieved by the proposed evolutionary framework were quantitatively evaluated, showing that the use of MedGA as a pre-processing stage outperforms the conventional image enhancement methods (i.e., histogram equalization, bi-histogram equalization, Gamma transformation, and sigmoid transformation), in terms of both MR image enhancement and segmentation evaluation metrics. CONCLUSIONS: Thanks to this framework, MR image segmentation accuracy is considerably increased, allowing for measurement repeatability in clinical workflows. The proposed computational solution could be well-suited for other clinical contexts requiring MR image analysis and segmentation, aiming at providing useful insights for differential diagnosis and prognosis.
Asunto(s)
Neoplasias Encefálicas/diagnóstico por imagen , Procesamiento de Imagen Asistido por Computador/métodos , Leiomioma/diagnóstico por imagen , Imagen por Resonancia Magnética , Algoritmos , Simulación por Computador , Toma de Decisiones , Femenino , Humanos , Neurocirugia , Radiocirugia , Programas InformáticosRESUMEN
BACKGROUND: In order to fully characterize the genome of an individual, the reconstruction of the two distinct copies of each chromosome, called haplotypes, is essential. The computational problem of inferring the full haplotype of a cell starting from read sequencing data is known as haplotype assembly, and consists in assigning all heterozygous Single Nucleotide Polymorphisms (SNPs) to exactly one of the two chromosomes. Indeed, the knowledge of complete haplotypes is generally more informative than analyzing single SNPs and plays a fundamental role in many medical applications. RESULTS: To reconstruct the two haplotypes, we addressed the weighted Minimum Error Correction (wMEC) problem, which is a successful approach for haplotype assembly. This NP-hard problem consists in computing the two haplotypes that partition the sequencing reads into two disjoint sub-sets, with the least number of corrections to the SNP values. To this aim, we propose here GenHap, a novel computational method for haplotype assembly based on Genetic Algorithms, yielding optimal solutions by means of a global search process. In order to evaluate the effectiveness of our approach, we run GenHap on two synthetic (yet realistic) datasets, based on the Roche/454 and PacBio RS II sequencing technologies. We compared the performance of GenHap against HapCol, an efficient state-of-the-art algorithm for haplotype phasing. Our results show that GenHap always obtains high accuracy solutions (in terms of haplotype error rate), and is up to 4× faster than HapCol in the case of Roche/454 instances and up to 20× faster when compared on the PacBio RS II dataset. Finally, we assessed the performance of GenHap on two different real datasets. CONCLUSIONS: Future-generation sequencing technologies, producing longer reads with higher coverage, can highly benefit from GenHap, thanks to its capability of efficiently solving large instances of the haplotype assembly problem. Moreover, the optimization approach proposed in GenHap can be extended to the study of allele-specific genomic features, such as expression, methylation and chromatin conformation, by exploiting multi-objective optimization techniques. The source code and the full documentation are available at the following GitHub repository: https://github.com/andrea-tango/GenHap .