RESUMEN
There is limited knowledge about the metabolic reprogramming induced by cancer therapies and how this contributes to therapeutic resistance. Here we show that although inhibition of PI3K-AKT-mTOR signaling markedly decreased glycolysis and restrained tumor growth, these signaling and metabolic restrictions triggered autophagy, which supplied the metabolites required for the maintenance of mitochondrial respiration and redox homeostasis. Specifically, we found that survival of cancer cells was critically dependent on phospholipase A2 (PLA2) to mobilize lysophospholipids and free fatty acids to sustain fatty acid oxidation and oxidative phosphorylation. Consistent with this, we observed significantly increased lipid droplets, with subsequent mobilization to mitochondria. These changes were abrogated in cells deficient for the essential autophagy gene ATG5 Accordingly, inhibition of PLA2 significantly decreased lipid droplets, decreased oxidative phosphorylation, and increased apoptosis. Together, these results describe how treatment-induced autophagy provides nutrients for cancer cell survival and identifies novel cotreatment strategies to override this survival advantage.
Asunto(s)
Antineoplásicos/farmacología , Neoplasias/metabolismo , Transducción de Señal/efectos de los fármacos , Animales , Apoptosis , Autofagia , Benzamidas/farmacología , Línea Celular Tumoral , Respiración de la Célula/efectos de los fármacos , Supervivencia Celular , Compuestos Heterocíclicos con 3 Anillos/farmacología , Humanos , Gotas Lipídicas/metabolismo , Ratones , Mitocondrias/efectos de los fármacos , Mitocondrias/metabolismo , Neoplasias/enzimología , Neoplasias/patología , Fosfatidilinositol 3-Quinasas/metabolismo , Inhibidores de las Quinasa Fosfoinosítidos-3 , Inhibidores de Fosfolipasa A2/farmacología , Fosfolípidos/metabolismo , Inhibidores de Proteínas Quinasas/farmacología , Proteínas Proto-Oncogénicas c-akt/antagonistas & inhibidores , Proteínas Proto-Oncogénicas c-akt/metabolismo , Pirimidinas/farmacología , Células Tumorales CultivadasRESUMEN
Networks have been an excellent framework for modeling complex biological information, but the methodological details of network-based tools are often described for a technical audience. We have developed Graphery, an interactive tutorial webserver that illustrates foundational graph concepts frequently used in network-based methods. Each tutorial describes a graph concept along with executable Python code that can be interactively run on a graph. Users navigate each tutorial using their choice of real-world biological networks that highlight the diverse applications of network algorithms. Graphery also allows users to modify the code within each tutorial or write new programs, which all can be executed without requiring an account. Graphery accepts ideas for new tutorials and datasets that will be shaped by both computational and biological researchers, growing into a community-contributed learning platform. Graphery is available at https://graphery.reedcompbio.org/.
Asunto(s)
Algoritmos , Modelos Biológicos , Programas Informáticos , Gráficos por Computador , Mapeo de Interacción de Proteínas , Transducción de SeñalRESUMEN
MOTIVATION: There has been recent increased interest in using algorithmic methods to infer the evolutionary tree underlying the developmental history of a tumor. Quantitative measures that compare such trees are vital to a number of different applications including benchmarking tree inference methods and evaluating common inheritance patterns across patients. However, few appropriate distance measures exist, and those that do have low resolution for differentiating trees or do not fully account for the complex relationship between tree topology and the inheritance of the mutations labeling that topology. RESULTS: Here, we present two novel distance measures, Common Ancestor Set distance (CASet) and Distinctly Inherited Set Comparison distance (DISC), that are specifically designed to account for the subclonal mutation inheritance patterns characteristic of tumor evolutionary trees. We apply CASet and DISC to multiple simulated datasets and two breast cancer datasets and show that our distance measures allow for more nuanced and accurate delineation between tumor evolutionary trees than existing distance measures. AVAILABILITY AND IMPLEMENTATION: Implementations of CASet and DISC are freely available at: https://bitbucket.org/oesperlab/stereodist. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
Asunto(s)
Algoritmos , Neoplasias , Evolución Biológica , Humanos , FilogeniaRESUMEN
SUMMARY: While next-generation sequencing (NGS) has dramatically increased the availability of genomic data, phased genome assembly and structural variant (SV) analyses are limited by NGS read lengths. Long-read sequencing from Pacific Biosciences and NGS barcoding from 10x Genomics hold the potential for far more comprehensive views of individual genomes. Here, we present MsPAC, a tool that combines both technologies to partition reads, assemble haplotypes (via existing software) and convert assemblies into high-quality, phased SV predictions. MsPAC represents a framework for haplotype-resolved SV calls that moves one step closer to fully resolved, diploid genomes. AVAILABILITY AND IMPLEMENTATION: https://github.com/oscarlr/MsPAC. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
Asunto(s)
Genómica , Secuenciación de Nucleótidos de Alto Rendimiento , Genoma , Haplotipos , Análisis de Secuencia de ADN , Programas InformáticosRESUMEN
Characterizing cellular responses to different extrinsic signals is an active area of research, and curated pathway databases describe these complex signaling reactions. Here, we revisit a fundamental question in signaling pathway analysis: are two molecules "connected" in a network? This question is the first step towards understanding the potential influence of molecules in a pathway, and the answer depends on the choice of modeling framework. We examined the connectivity of Reactome signaling pathways using four different pathway representations. We find that Reactome is very well connected as a graph, moderately well connected as a compound graph or bipartite graph, and poorly connected as a hypergraph (which captures many-to-many relationships in reaction networks). We present a novel relaxation of hypergraph connectivity that iteratively increases connectivity from a node while preserving the hypergraph topology. This measure, B-relaxation distance, provides a parameterized transition between hypergraph connectivity and graph connectivity. B-relaxation distance is sensitive to the presence of small molecules that participate in many functionally unrelated reactions in the network. We also define a score that quantifies one pathway's downstream influence on another, which can be calculated as B-relaxation distance gradually relaxes the connectivity constraint in hypergraphs. Computing this score across all pairs of 34 Reactome pathways reveals pairs of pathways with statistically significant influence. We present two such case studies, and we describe the specific reactions that contribute to the large influence score. Finally, we investigate the ability for connectivity measures to capture functional relationships among proteins, and use the evidence channels in the STRING database as a benchmark dataset. STRING interactions whose proteins are B-connected in Reactome have statistically significantly higher scores than interactions connected in the bipartite graph representation. Our method lays the groundwork for other generalizations of graph-theoretic concepts to hypergraphs in order to facilitate signaling pathway analysis.
Asunto(s)
Transducción de Señal/fisiología , Algoritmos , Simulación por Computador , Bases de Datos Factuales/estadística & datos numéricos , Modelos Estadísticos , ProteínasRESUMEN
BACKGROUND: Understanding cellular responses via signal transduction is a core focus in systems biology. Tools to automatically reconstruct signaling pathways from protein-protein interactions (PPIs) can help biologists generate testable hypotheses about signaling. However, automatic reconstruction of signaling pathways suffers from many interactions with the same confidence score leading to many equally good candidates. Further, some reconstructions are biologically misleading due to ignoring protein localization information. RESULTS: We propose LocPL, a method to improve the automatic reconstruction of signaling pathways from PPIs by incorporating information about protein localization in the reconstructions. The method relies on a dynamic program to ensure that the proteins in a reconstruction are localized in cellular compartments that are consistent with signal transduction from the membrane to the nucleus. LocPL and existing reconstruction algorithms are applied to two PPI networks and assessed using both global and local definitions of accuracy. LocPL produces more accurate and biologically meaningful reconstructions on a versatile set of signaling pathways. CONCLUSION: LocPL is a powerful tool to automatically reconstruct signaling pathways from PPIs that leverages cellular localization information about proteins. The underlying dynamic program and signaling model are flexible enough to study cellular signaling under different settings of signaling flow across the cellular compartments.
Asunto(s)
Biología Computacional/métodos , Proteínas/metabolismo , Transducción de Señal , Algoritmos , Automatización , Humanos , Unión Proteica , Mapeo de Interacción de Proteínas , Transporte de ProteínasRESUMEN
BACKGROUND: Schizophrenia and autism are examples of polygenic diseases caused by a multitude of genetic variants, many of which are still poorly understood. Recently, both diseases have been associated with disrupted neuron motility and migration patterns, suggesting that aberrant cell motility is a phenotype for these neurological diseases. RESULTS: We formulate the POLYGENIC DISEASE PHENOTYPE Problem which seeks to identify candidate disease genes that may be associated with a phenotype such as cell motility. We present a machine learning approach to solve this problem for schizophrenia and autism genes within a brain-specific functional interaction network. Our method outperforms peer semi-supervised learning approaches, achieving better cross-validation accuracy across different sets of gold-standard positives. We identify top candidates for both schizophrenia and autism, and select six genes labeled as schizophrenia positives that are predicted to be associated with cell motility for follow-up experiments. CONCLUSIONS: Candidate genes predicted by our method suggest testable hypotheses about these genes’ role in cell motility regulation, offering a framework for generating predictions for experimental validation.
Asunto(s)
Movimiento Celular/genética , Enfermedad/genética , Redes Reguladoras de Genes , Herencia Multifactorial/genética , Algoritmos , Trastorno Autístico/genética , Estudios de Asociación Genética , Humanos , Aprendizaje Automático , Fenotipo , Curva ROC , Reproducibilidad de los Resultados , Esquizofrenia/genéticaRESUMEN
SUMMARY: Networks have become ubiquitous in systems biology. Visualization is a crucial component in their analysis. However, collaborations within research teams in network biology are hampered by software systems that are either specific to a computational algorithm, create visualizations that are not biologically meaningful, or have limited features for sharing networks and visualizations. We present GraphSpace, a web-based platform that fosters team science by allowing collaborating research groups to easily store, interact with, layout and share networks. AVAILABILITY AND IMPLEMENTATION: Anyone can upload and share networks at http://graphspace.org. In addition, the GraphSpace code is available at http://github.com/Murali-group/graphspace if a user wants to run his or her own server. CONTACT: murali@cs.vt.edu. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
Asunto(s)
Programas Informáticos , Biología de Sistemas/métodos , Algoritmos , Biología Computacional , Comunicación InterdisciplinariaRESUMEN
MOTIVATION: Structural variation is common in human and cancer genomes. High-throughput DNA sequencing has enabled genome-scale surveys of structural variation. However, the short reads produced by these technologies limit the study of complex variants, particularly those involving repetitive regions. Recent 'third-generation' sequencing technologies provide single-molecule templates and longer sequencing reads, but at the cost of higher per-nucleotide error rates. RESULTS: We present MultiBreak-SV, an algorithm to detect structural variants (SVs) from single molecule sequencing data, paired read sequencing data, or a combination of sequencing data from different platforms. We demonstrate that combining low-coverage third-generation data from Pacific Biosciences (PacBio) with high-coverage paired read data is advantageous on simulated chromosomes. We apply MultiBreak-SV to PacBio data from four human fosmids and show that it detects known SVs with high sensitivity and specificity. Finally, we perform a whole-genome analysis on PacBio data from a complete hydatidiform mole cell line and predict 1002 high-probability SVs, over half of which are confirmed by an Illumina-based assembly.
Asunto(s)
Algoritmos , Variación Estructural del Genoma , Secuenciación de Nucleótidos de Alto Rendimiento/métodos , Análisis de Secuencia de ADN/métodos , Genómica/métodos , Humanos , Secuencias Repetitivas de Ácidos Nucleicos , Eliminación de SecuenciaRESUMEN
This survey study aimed to contribute to the extensive debate on the dimensionality of the Posttraumatic Stress Disorder Checklist for the DSM-5 (PCL-5) questionnaire by examining the psychometric properties and construct validity of its Hungarian version and relying on the inspection of a conceptual network of related variables, that is, perceived stress, hostility, and resilience. Confirmatory factor analysis (CFA), exploratory structural equation modelling (ESEM) and path analysis were applied on data collected from 177 paramedics and 66 professionals from the social field (58.4% male; Mage = 43.5 ± 9.96 years). Despite the acceptable fit indices gained with CFA when testing the original four-factor DSM-5 model of PCL-5, strong associations (r = 0.69-0.90) between subscales were found. Thus, ESEM was applied and as a result a new, three-factor version of the DSM-5 model of PCL-5 was proposed due to significant crossloadings that was theoretically also supported. The Reexperiencing and Avoidance subscales were merged and named Difficulty with Assimilation of Experience (DAE). In the path analysis only two of the tested associations were not significant using the new factor structure, in which stress fully mediated the relationship between resilience and DAE, and resilience and Hyperarousal. Overall, the hypothesised pathways between variables fit the collected data well. (weighted least squares mean-and variance-adjusted χ2 = 503.750 (df = 270), comparative fit index = 0.948, Tucker-Lewis index = 0.939, root mean square error of approximation (90% confidence interval) = 0.064 (0.055-0.073), weighted root mean square residual = 1.024). Our analysis of the Hungarian version of PCL-5 contributes to the testing of a DSM-5-based questionnaire measuring posttraumatic stress disorder symptomology.
Asunto(s)
Técnicos Medios en Salud , Hostilidad , Psicometría , Resiliencia Psicológica , Trastornos por Estrés Postraumático , Humanos , Masculino , Femenino , Adulto , Trastornos por Estrés Postraumático/psicología , Trastornos por Estrés Postraumático/diagnóstico , Persona de Mediana Edad , Técnicos Medios en Salud/psicología , Encuestas y Cuestionarios , Hungría , Análisis Factorial , Estrés Psicológico/psicología , Reproducibilidad de los Resultados , ParamédicoRESUMEN
Summary: Network biology is an interdisciplinary field bridging computational and biological sciences that has proved pivotal in advancing the understanding of cellular functions and diseases across biological systems and scales. Although the field has been around for two decades, it remains nascent. It has witnessed rapid evolution, accompanied by emerging challenges. These stem from various factors, notably the growing complexity and volume of data together with the increased diversity of data types describing different tiers of biological organization. We discuss prevailing research directions in network biology, focusing on molecular/cellular networks but also on other biological network types such as biomedical knowledge graphs, patient similarity networks, brain networks, and social/contact networks relevant to disease spread. In more detail, we highlight areas of inference and comparison of biological networks, multimodal data integration and heterogeneous networks, higher-order network analysis, machine learning on networks, and network-based personalized medicine. Following the overview of recent breakthroughs across these five areas, we offer a perspective on future directions of network biology. Additionally, we discuss scientific communities, educational initiatives, and the importance of fostering diversity within the field. This article establishes a roadmap for an immediate and long-term vision for network biology. Availability and implementation: Not applicable.
RESUMEN
Motivation: Higher-order interaction patterns among proteins have the potential to reveal mechanisms behind molecular processes and diseases. While clustering methods are used to identify functional groups within molecular interaction networks, these methods largely focus on edge density and do not explicitly take into consideration higher-order interactions. Disease genes in these networks have been shown to exhibit rich higher-order structure in their vicinity, and considering these higher-order interaction patterns in network clustering have the potential to reveal new disease-associated modules. Results: We propose a higher-order community detection method which identifies community structure in networks with respect to specific higher-order connectivity patterns beyond edges. Higher-order community detection on four different protein-protein interaction networks identifies biologically significant modules and disease modules that conventional edge-based clustering methods fail to discover. Higher-order clusters also identify disease modules from genome-wide association study data, including new modules that were not discovered by top-performing approaches in a Disease Module DREAM Challenge. Our approach provides a more comprehensive view of community structure that enables us to predict new disease-gene associations. Availability and implementation: https://github.com/Reed-CompBio/graphlet-clustering.
RESUMEN
A major challenge in molecular systems biology is to understand how proteins work to transmit external signals to changes in gene expression. Computationally reconstructing these signaling pathways from protein interaction networks can help understand what is missing from existing pathway databases. We formulate a new pathway reconstruction problem, one that iteratively grows directed acyclic graphs (DAGs) from a set of starting proteins in a protein interaction network. We present an algorithm that provably returns the optimal DAGs for two different cost functions and evaluate the pathway reconstructions when applied to six diverse signaling pathways from the NetPath database. The optimal DAGs outperform an existing k-shortest paths method for pathway reconstruction, and the new reconstructions are enriched for different biological processes. Growing DAGs is a promising step toward reconstructing pathways that provably optimize a specific cost function.
Asunto(s)
Algoritmos , Transducción de Señal , Biología de Sistemas , Proteínas , Mapas de Interacción de ProteínasRESUMEN
The scientific community is rapidly generating protein sequence information, but only a fraction of these proteins can be experimentally characterized. While promising deep learning approaches for protein prediction tasks have emerged, they have computational limitations or are designed to solve a specific task. We present a Transformer neural network that pre-trains task-agnostic sequence representations. This model is fine-tuned to solve two different protein prediction tasks: protein family classification and protein interaction prediction. Our method is comparable to existing state-of-the-art approaches for protein family classification while being much more general than other architectures. Further, our method outperforms other approaches for protein interaction prediction for two out of three different scenarios that we generated. These results offer a promising framework for fine-tuning the pre-trained sequence representations for other protein prediction tasks.
Asunto(s)
Redes Neurales de la Computación , Proteínas , Secuencia de AminoácidosRESUMEN
The morphogenetic process of apical constriction, which relies on non-muscle myosin II (NMII) generated constriction of apical domains of epithelial cells, is key to the development of complex cellular patterns. Apical constriction occurs in almost all multicellular organisms, but one of the most well-characterized systems is the Folded-gastrulation (Fog)-induced apical constriction that occurs in Drosophila. The binding of Fog to its cognizant receptors Mist/Smog results in a signaling cascade that leads to the activation of NMII-generated contractility. Despite our knowledge of key molecular players involved in Fog signaling, we sought to explore whether other proteins have an undiscovered role in its regulation. We developed a computational method to predict unidentified candidate NMII regulators using a network of pairwise protein-protein interactions called an interactome. We first constructed a Drosophila interactome of over 500,000 protein-protein interactions from several databases that curate high-throughput experiments. Next, we implemented several graph-based algorithms that predicted 14 proteins potentially involved in Fog signaling. To test these candidates, we used RNAi depletion in combination with a cellular contractility assay in Drosophila S2R + cells, which respond to Fog by contracting in a stereotypical manner. Of the candidates we screened using this assay, two proteins, the serine/threonine phosphatase Flapwing and the putative guanylate kinase CG11811 were demonstrated to inhibit cellular contractility when depleted, suggestive of their roles as novel regulators of the Fog pathway.
Asunto(s)
Proteínas de Drosophila , Gastrulación , Animales , Drosophila/metabolismo , Proteínas de Drosophila/metabolismo , Miosina Tipo II/metabolismo , Transducción de Señal/fisiologíaRESUMEN
BACKGROUND: A cancer genome is derived from the germline genome through a series of somatic mutations. Somatic structural variants - including duplications, deletions, inversions, translocations, and other rearrangements - result in a cancer genome that is a scrambling of intervals, or "blocks" of the germline genome sequence. We present an efficient algorithm for reconstructing the block organization of a cancer genome from paired-end DNA sequencing data. RESULTS: By aligning paired reads from a cancer genome - and a matched germline genome, if available - to the human reference genome, we derive: (i) a partition of the reference genome into intervals; (ii) adjacencies between these intervals in the cancer genome; (iii) an estimated copy number for each interval. We formulate the Copy Number and Adjacency Genome Reconstruction Problem of determining the cancer genome as a sequence of the derived intervals that is consistent with the measured adjacencies and copy numbers. We design an efficient algorithm, called Paired-end Reconstruction of Genome Organization (PREGO), to solve this problem by reducing it to an optimization problem on an interval-adjacency graph constructed from the data. The solution to the optimization problem results in an Eulerian graph, containing an alternating Eulerian tour that corresponds to a cancer genome that is consistent with the sequencing data. We apply our algorithm to five ovarian cancer genomes that were sequenced as part of The Cancer Genome Atlas. We identify numerous rearrangements, or structural variants, in these genomes, analyze reciprocal vs. non-reciprocal rearrangements, and identify rearrangements consistent with known mechanisms of duplication such as tandem duplications and breakage/fusion/bridge (B/F/B) cycles. CONCLUSIONS: We demonstrate that PREGO efficiently identifies complex and biologically relevant rearrangements in cancer genome sequencing data. An implementation of the PREGO algorithm is available at http://compbio.cs.brown.edu/software/.
Asunto(s)
Algoritmos , Genoma Humano , Mutación , Neoplasias Ováricas/genética , Aberraciones Cromosómicas , Variaciones en el Número de Copia de ADN , Femenino , Humanos , Análisis de Secuencia de ADN/métodosRESUMEN
A major goal of molecular systems biology is to understand the coordinated function of genes or proteins in response to cellular signals and to understand these dynamics in the context of disease. Signaling pathway databases such as KEGG, NetPath, NCI-PID, and Panther describe the molecular interactions involved in different cellular responses. While the same pathway may be present in different databases, prior work has shown that the particular proteins and interactions differ across database annotations. However, to our knowledge no one has attempted to quantify their structural differences. It is important to characterize artifacts or other biases within pathway databases, which can provide a more informed interpretation for downstream analyses. In this work we consider signaling pathways as graphs and we use topological measures to study their structure. We find that topological characterization using graphlets (small, connected subgraphs) distinguishes signaling pathways from appropriate null models of interaction networks. Next, we quantify topological similarity across pathway databases. Our analysis reveals that the pathways harbor database-specific characteristics implying that even though these databases describe the same pathways, they tend to be systematically different from one another. We show that pathway-specific topology can be uncovered after accounting for database-specific structure. This work presents the first step towards elucidating common pathway structure beyond their specific database annotations.Data Availability: https://github.com/Reed-CompBio/pathway-reconciliation.
Asunto(s)
Biología Computacional , Transducción de Señal , Bases de Datos Factuales , Humanos , Proteínas , Biología de SistemasRESUMEN
BACKGROUND: Copy number variants (CNVs), including deletions, amplifications, and other rearrangements, are common in human and cancer genomes. Copy number data from array comparative genome hybridization (aCGH) and next-generation DNA sequencing is widely used to measure copy number variants. Comparison of copy number data from multiple individuals reveals recurrent variants. Typically, the interior of a recurrent CNV is examined for genes or other loci associated with a phenotype. However, in some cases, such as gene truncations and fusion genes, the target of variant lies at the boundary of the variant. RESULTS: We introduce Neighborhood Breakpoint Conservation (NBC), an algorithm for identifying rearrangement breakpoints that are highly conserved at the same locus in multiple individuals. NBC detects recurrent breakpoints at varying levels of resolution, including breakpoints whose location is exactly conserved and breakpoints whose location varies within a gene. NBC also identifies pairs of recurrent breakpoints such as those that result from fusion genes. We apply NBC to aCGH data from 36 primary prostate tumors and identify 12 novel rearrangements, one of which is the well-known TMPRSS2-ERG fusion gene. We also apply NBC to 227 glioblastoma tumors and predict 93 novel rearrangements which we further classify as gene truncations, germline structural variants, and fusion genes. A number of these variants involve the protein phosphatase PTPN12 suggesting that deregulation of PTPN12, via a variety of rearrangements, is common in glioblastoma. CONCLUSIONS: We demonstrate that NBC is useful for detection of recurrent breakpoints resulting from copy number variants or other structural variants, and in particular identifies recurrent breakpoints that result in gene truncations or fusion genes. Software is available at http://http.//cs.brown.edu/people/braphael/software.html.
Asunto(s)
Algoritmos , Puntos de Rotura del Cromosoma , Hibridación Genómica Comparativa , Variaciones en el Número de Copia de ADN , Neoplasias/genética , Dosificación de Gen , Glioblastoma/genética , Humanos , Masculino , Neoplasias de la Próstata/genética , Proteína Tirosina Fosfatasa no Receptora Tipo 12/genética , Análisis de Secuencia de ADN , Programas InformáticosRESUMEN
MOTIVATION: Structural variation including deletions, duplications and rearrangements of DNA sequence are an important contributor to genome variation in many organisms. In human, many structural variants are found in complex and highly repetitive regions of the genome making their identification difficult. A new sequencing technology called strobe sequencing generates strobe reads containing multiple subreads from a single contiguous fragment of DNA. Strobe reads thus generalize the concept of paired reads, or mate pairs, that have been routinely used for structural variant detection. Strobe sequencing holds promise for unraveling complex variants that have been difficult to characterize with current sequencing technologies. RESULTS: We introduce an algorithm for identification of structural variants using strobe sequencing data. We consider strobe reads from a test genome that have multiple possible alignments to a reference genome due to sequencing errors and/or repetitive sequences in the reference. We formulate the combinatorial optimization problem of finding the minimum number of structural variants in the test genome that are consistent with these alignments. We solve this problem using an integer linear program. Using simulated strobe sequencing data, we show that our algorithm has better sensitivity and specificity than paired read approaches for structural variation identification. CONTACT: braphael@brown.edu