ABSTRACT
Human genomics is undergoing a step change from being a predominantly research-driven activity to one driven through health care as many countries in Europe now have nascent precision medicine programmes. To maximize the value of the genomic data generated, these data will need to be shared between institutions and across countries. In recognition of this challenge, 21 European countries recently signed a declaration to transnationally share data on at least 1 million human genomes by 2022. In this Roadmap, we identify the challenges of data sharing across borders and demonstrate that European research infrastructures are well-positioned to support the rapid implementation of widespread genomic data access.
Subject(s)
Biomedical Research , Genome, Human , Human Genome Project , Europe , HumansABSTRACT
An amendment to this paper has been published and can be accessed via a link at the top of the paper.
ABSTRACT
Coarse-grained models have emerged as valuable tools to simulate long DNA molecules while maintaining computational efficiency. These models aim at preserving interactions among coarse-grained variables in a manner that mirrors the underlying atomistic description. We explore here a method for testing coarse-grained vs all-atom models using stiffness matrices in Fourier space (q-stiffnesses), which are particularly suited to probe DNA elasticity at different length scales. We focus on a class of coarse-grained rigid base DNA models known as cgDNA and its most recent version, cgDNA+. Our analysis shows that while cgDNA+ closely follows the q-stiffnesses of the all-atom model, the original cgDNA shows some deviations for twist and bending variables, which are rather strong in the q â 0 (long length scale) limit. The consequence is that while both cgDNA and cgDNA+ give a suitable description of local elastic behavior, the former misses some effects that manifest themselves at longer length scales. In particular, cgDNA performs poorly on twist stiffness, with a value much lower than expected for long DNA molecules. Conversely, the all-atom and cgDNA+ twist are strongly length scale dependent: DNA is torsionally soft at a few base pair distances but becomes more rigid at distances of a few dozen base pairs. Our analysis shows that the bending persistence length in all-atom and cgDNA+ is somewhat overestimated.
Subject(s)
DNA , Elasticity , Base PairingABSTRACT
Extracellular vesicles (EVs) are cell-derived structures surrounded by a lipid bilayer that carry RNA and DNA as potential templates for molecular diagnostics, e.g., in cancer genotyping. While it has been established that DNA templates appear on the outside of EVs, no consensus exists on which nucleic acid species inside small EVs (<200 nm, sEVs) are sufficiently abundant and accessible for developing genotyping protocols. We investigated this by extracting total intravesicular nucleic acid content from sEVs isolated from the conditioned cell medium of the human NCI-H1975 cell line containing the epidermal growth factor (EGFR) gene mutation T790M as a model system for non-small cell lung cancer. We observed that mainly short genomic DNA (<35−100 bp) present in the sEVs served as a template. Using qEV size exclusion chromatography (SEC), significantly lower yield and higher purity of isolated sEV fractions were obtained as compared to exoEasy membrane affinity purification and ultracentrifugation. Nevertheless, we detected the EGFR T790M mutation in the sEVs' lumen with similar sensitivity using digital PCR. When applying SEC-based sEV separation prior to cell-free DNA extraction on spiked human plasma samples, we found significantly higher mutant allele frequencies as compared to standard cell-free DNA extraction, which in part was due to co-purification of circulating tumor DNA. We conclude that intravesicular genomic DNA can be exploited next to ctDNA to enhance EGFR T790M mutation detection sensitivity by adding a fast and easy-to-use sEV separation method, such as SEC, upstream of standard clinical cell-free DNA workflows.
Subject(s)
Carcinoma, Non-Small-Cell Lung , Cell-Free Nucleic Acids , Circulating Tumor DNA , Lung Neoplasms , Humans , Lung Neoplasms/diagnosis , Carcinoma, Non-Small-Cell Lung/diagnosis , ErbB Receptors/genetics , Mutation , Protein Kinase Inhibitors , Oncogenes , Epidermal Growth Factor/genetics , Chromatography, Gel , GenomicsABSTRACT
For differential expression studies in all omics disciplines, data normalization is a crucial step that is often subject to a balance between speed and effectiveness. To keep up with the data produced by high-throughput instruments, researchers require fast and easy-to-use yet effective methods that fit into automated analysis pipelines. The CONSTANd normalization method meets these criteria, so we have made its source code available for R/BioConductor and Python. We briefly review the method and demonstrate how it can be used in different omics contexts for experiments of any scale. Widespread adoption across omics disciplines would ease data integration in multiomics experiments.
Subject(s)
Boidae , Software , Animals , ProteomicsABSTRACT
RATIONALE: The current methods for identifying peptides in mass spectral product ion data still struggle to do so for the majority of spectra. Based on the experimental setup and other assumptions, such methods restrict the search space to speed up computations, but at the cost of creating blind spots. The proteomics community would greatly benefit from a method that is capable of covering the entire search space without using any restrictions, thus establishing a baseline for identification. METHODS: We conceived the "mass pattern paradigm" (MPP) that enables the creation of such an identification method, and we implemented it into a prototype database search engine "PRiSM" (PRotein-Spectrum Matching). We then assessed its operational characteristics by applying it to publicly available high-precision mass spectra of low and high identification difficulty. We used those characteristics to gain theoretical insights into trade-offs between sensitivity and speed when trying to establish a baseline for identification. RESULTS: Of 100 low difficulty spectra, PRiSM and SEQUEST agree on 84 identifications (of which 75 are statistically significant). Of 15 of 100 spectra not identified in a previous study (using SEQUEST), 13 are considered reliable after visual inspection and represent 3 proteins (out of 9 in total) not detected previously. CONCLUSIONS: Despite leaving noise intact, the simple PRiSM prototype can make statistically reliable identifications, while controlling the false discovery rate by fitting a null distribution. It also identifies some spectra previously unidentifiable in an "extremely open" SEQUEST search, paving the way to establishing a baseline for identification in proteomics.
ABSTRACT
In the context of omics disciplines and especially proteomics and biomarker discovery, the analysis of a clinical sample using label-based tandem mass spectrometry (MS) can be affected by sample preparation effects or by the measurement process itself, resulting in an incorrect outcome. Detection and correction of these mistakes using state-of-the-art methods based on mixed models can use large amounts of (computing) time. MS-based proteomics laboratories are high-throughput and need to avoid a bottleneck in their quantitative pipeline by quickly discriminating between high- and low-quality data. To this end we developed an easy-to-use web-tool called QCQuan (available at qcquan.net ) which is built around the CONSTANd normalization algorithm. It automatically provides the user with exploratory and quality control information as well as a differential expression analysis based on conservative, simple statistics. In this document we describe in detail the scientifically relevant steps that constitute the workflow and assess its qualitative and quantitative performance on three reference data sets. We find that QCQuan provides clear and accurate indications about the scientific value of both a high- and a low-quality data set. Moreover, it performed quantitatively better on a third data set than a comparable workflow assembled using established, reliable software.
Subject(s)
Algorithms , Bacterial Proteins/isolation & purification , Data Accuracy , Pectobacterium carotovorum/chemistry , Proteomics/statistics & numerical data , Software , Animals , Cattle , Chromatography, Liquid , Complex Mixtures/chemistry , Cytochromes c/isolation & purification , Datasets as Topic , Glycogen Phosphorylase/isolation & purification , Internet , Phosphopyruvate Hydratase/isolation & purification , Proteomics/methods , Quality Control , Rabbits , Serum Albumin, Bovine/isolation & purification , Staining and Labeling/methods , Tandem Mass SpectrometryABSTRACT
Top-down proteomics approaches are becoming ever more popular, due to the advantages offered by knowledge of the intact protein mass in correctly identifying the various proteoforms that potentially arise due to point mutation, alternative splicing, post-translational modifications, etc. Usually, the average mass is used in this context; however, it is known that this can fluctuate significantly due to both natural and technical causes. Ideally, one would prefer to use the monoisotopic precursor mass, but this falls below the detection limit for all but the smallest proteins. Methods that predict the monoisotopic mass based on the average mass are potentially affected by imprecisions associated with the average mass. To address this issue, we have developed a framework based on simple, linear models that allows prediction of the monoisotopic mass based on the exact mass of the most-abundant (aggregated) isotope peak, which is a robust measure of mass, insensitive to the aforementioned natural and technical causes. This linear model was tested experimentally, as well as in silico, and typically predicts monoisotopic masses with an accuracy of only a few parts per million. A confidence measure is associated with the predicted monoisotopic mass to handle the off-by-one-Da prediction error. Furthermore, we introduce a correction function to extract the "true" (i.e., theoretically) most-abundant isotope peak from a spectrum, even if the observed isotope distribution is distorted by noise or poor ion statistics. The method is available online as an R shiny app: https://valkenborg-lab.shinyapps.io/mind/.
Subject(s)
Algorithms , Chromatography, Liquid/methods , Models, Statistical , Proteins/analysis , Proteome/analysis , Tandem Mass Spectrometry/methods , Humans , Protein Processing, Post-Translational , Proteins/metabolismABSTRACT
DNA surface-hybridization biosensors utilize the selective hybridization of target sequences in solution to surface-immobilized probes. In this process, the target is usually assumed to be in excess, so that its concentration does not significantly vary while hybridizing to the surface-bound probes. If the target is initially at low concentrations and/or if the number of probes is very large, and they have high affinity for the target, the DNA in solution may become depleted. In this paper we analyze the equilibrium and kinetics of hybridization of DNA biosensors in the case of strong target depletion, by extending the Langmuir adsorption model. We focus, in particular, on the detection of a small amount of a single-nucleotide "mutant" sequence (concentration c2) in a solution, which differs by one or more nucleotides from an abundant "wild-type" sequence (concentration c1 â« c2). We show that depletion can give rise to a strongly enhanced sensitivity of the biosensors. Using representative values of rate constants and hybridization free energies, we find that in the depletion regime one could detect relative concentrations c2/c1 that are up to 3 orders of magnitude smaller than in the conventional approach. The kinetics is surprisingly rich and exhibits a nonmonotonic adsorption with no counterpart in the no-depletion case. Finally, we show that, alongside enhanced detection sensitivity, this approach offers the possibility of sample enrichment, by substantially increasing the relative amount of the mutant over the wild-type sequence.
Subject(s)
Biosensing Techniques/methods , DNA/chemistry , Adsorption , Kinetics , Nucleic Acid Hybridization , Surface PropertiesABSTRACT
BACKGROUND: Human hematopoietic progenitor cells (HPCs) are important for cell therapy in cancer and tissue regeneration. In vitro studies have shown a transient association of 40 nm polystyrene nanoparticles (PS NPs) with these cells, which is of interest for intelligent design and application of NPs in HPC-based regenerative protocols. In this study, we aimed to investigate the involvement of nanoparticles' size and membrane-attached glycan molecules in the interaction of HPCs with PS NPs, and compared it with monocytes. Human cord blood-derived HPCs and THP-1 cells were exposed to fluorescently labelled, carboxylated PS NPs of 40, 100 and 200 nm. Time-dependent nanoparticle membrane association and/or uptake was observed by measuring fluorescence intensity of exposed cells at short time intervals using flow cytometry. By pretreating the cells with neuraminidase, we studied the possible effect of membrane-associated sialic acids in the interaction with NPs. Confocal microscopy was used to visualize the cell-specific character of the NP association. RESULTS: Confocal images revealed that the majority of PS NPs was initially observed to be retained at the outer membrane of HPCs, while the same NPs showed immediate internalization by THP-1 monocytic cells. After prolonged exposure up to 4 h, PS NPs were also observed to enter the HPCs' intracellular compartment. Cell-specific time courses of NP association with HPCs and THP-1 cells remained persistent after cells were enzymatically treated with neuraminidase, but significantly increased levels of NP association could be observed, suggesting a role for membrane-associated sialic acids in this process. CONCLUSIONS: We conclude that the terminal membrane-associated sialic acids contribute to the NP retention at the outer cell membrane of HPCs. This retention behavior is a unique characteristic of the HPCs and is independent of NP size.
Subject(s)
Hematopoietic Stem Cells/metabolism , Monocytes/metabolism , Nanoparticles/chemistry , Sialic Acids/chemistry , Antigens, CD34/metabolism , Biological Transport , Cell Line , Cell Membrane/drug effects , Cell Survival/drug effects , Delayed-Action Preparations/metabolism , Endocytosis/drug effects , Humans , Particle Size , Polystyrenes , Surface PropertiesABSTRACT
In quantitative proteomics applications, the use of isobaric labels is a very popular concept as they allow for multiplexing, such that peptides from multiple biological samples are quantified simultaneously in one mass spectrometry experiment. Although this multiplexing allows that peptide intensities are affected by the same amount of instrument variability, systematic effects during sample preparation can also introduce a bias in the quantitation measurements. Therefore, normalization methods are required to remove this systematic error. At present, a few dedicated normalization methods for isobaric labeled data are at hand. Most of these normalization methods include a framework for statistical data analysis and rely on ANOVA or linear mixed models. However, for swift quality control of the samples or data visualization a simple normalization technique is sufficient. To this aim, we present a new and easy-to-use data-driven normalization method, named CONSTANd. The CONSTANd method employs constrained optimization and prior information about the labeling strategy to normalize the peptide intensities. Further, it allows maintaining the connection to any biological effect while reducing the systematic and technical errors. As a result, peptides can not only be compared directly within a multiplexed experiment, but are also comparable between other isobaric labeled datasets from multiple experimental designs that are normalized by the CONSTANd method, without the need to include a reference sample in every experimental setup. The latter property is especially useful when more than six, eight or ten (TMT/iTRAQ) biological samples are required to detect differential peptides with sufficient statistical power and to optimally make use of the multiplexing capacity of isobaric labels.
Subject(s)
Peptide Fragments/chemistry , Proteomics/standards , Staining and Labeling/methods , Algorithms , Data Interpretation, Statistical , Proteomics/methods , Tandem Mass Spectrometry/methodsABSTRACT
A major conceptual breakthrough in cell signaling has been the finding of EV as new biomarker shuttles in body fluids. Now, one of the major challenges in using these nanometer-sized biological entities as diagnostic marker is the development of translational methodologies to profile them. SPR offers a promising label-free and real time platform with a high potential for biomarker detection. Therefore, we aimed to develop a uniform SPR methodology to detect specific surface markers on EV derived from patient with CHD. EVs having an approximate size range between 30 and 100 nm (~48.5%) and 100-300 nm (~51.5%) were successfully isolated. The biomarker profile of EV was verified using immunogold labeling, ELISA and SPR. Using SPR, we demonstrated an increased binding of EV derived from patients with CHD to anti-ICAM-1 antibodies as compared to EV from healthy donors. Our current findings open up novel opportunities for in-depth and label-free investigation of EV.
Subject(s)
Biomarkers , Endothelial Cells , Extracellular Vesicles , Surface Plasmon Resonance , Coronary Disease , Humans , Inflammation , Nanotechnology/methodsABSTRACT
Novel insights in nanoparticle (NP) uptake routes of cells, their intracellular trafficking and subcellular targeting can be obtained through the investigation of their temporal and spatial behavior. In this work, we present the application of image (cross-) correlation spectroscopy (IC(C)S) and single particle tracking (SPT) to monitor the intracellular dynamics of polystyrene (PS) NPs in the human lung carcinoma A549 cell line. The ensemble kinetic behavior of NPs inside the cell was characterized by temporal and spatiotemporal image correlation spectroscopy (TICS and STICS). Moreover, a more direct interpretation of the diffusion and flow detected in the NP motion was obtained by SPT by monitoring individual NPs. Both techniques demonstrate that the PS NP transport in A549 cells is mainly dependent on microtubule-assisted transport. By applying spatiotemporal image cross-correlation spectroscopy (STICCS), the correlated motions of NPs with the early endosomes, late endosomes and lysosomes are identified. PS NPs were equally distributed among the endolysosomal compartment during the time interval of the experiments. The cotransport of the NPs with the lysosomes is significantly larger compared to the other cell organelles. In the present study we show that the complementarity of ICS-based techniques and SPT enables a consistent elaborate model of the complex behavior of NPs inside biological systems.
Subject(s)
Epithelial Cells/metabolism , Lung/metabolism , Nanoparticles , Polystyrenes , Cell Line, Tumor , Epithelial Cells/cytology , Humans , Lung/cytology , Spectrum AnalysisABSTRACT
Detection of point mutations and single nucleotide polymorphisms in DNA and RNA has a growing importance in biology, biotechnology, and medicine. For the application at hand, hybridization assays are often used. Traditionally, they differentiate point mutations only at elevated temperatures (>40 °C) and in narrow intervals (ΔT = 1-10 °C). The current study demonstrates that a specially designed multistranded DNA probe can differentiate point mutations in the range of 5-40 °C. This unprecedentedly broad ambient-temperature range is enabled by a controlled combination of (i) nonequilibrium hybridization conditions and (ii) a mismatch-induced increase of equilibration time in respect to that of a fully matched complex, which we dub "kinetic inversion".
ABSTRACT
With the current expanded technical capabilities to perform mass spectrometry-based biomedical proteomics experiments, an improved focus on the design of experiments is crucial. As it is clear that ignoring the importance of a good design leads to an unprecedented rate of false discoveries which would poison our results, more and more tools are developed to help researchers designing proteomic experiments. In this review, we apply statistical thinking to go through the entire proteomics workflow for biomarker discovery and validation and relate the considerations that should be made at the level of hypothesis building, technology selection, experimental design and the optimization of the experimental parameters.
Subject(s)
Mass Spectrometry/methods , Proteomics/methods , Research Design , Humans , Proteomics/statistics & numerical data , Proteomics/trendsABSTRACT
We present a general analytical model for the intensity fluctuation autocorrelation function for second and third harmonic generating point scatterers. Expressions are derived for a stationary laser beam and for scanning beam configurations for specific correlation methodologies. We discuss free translational diffusion in both three and two dimensions. At low particle concentrations, the expressions for fluorescence are retrieved, while at high particle concentrations a rescaling of the function parameters is required for a stationary illumination beam, provided that the phase shift per unit length of the beam equals zero.
Subject(s)
Models, Chemical , Diffusion , Fluorescence , Spectrometry, FluorescenceABSTRACT
Within a single infected individual, a virus population can have a high genomic variability. In the case of HIV, several mutations can be present even in a small genomic window of 20-30 nucleotides. For diagnostics purposes, it is often needed to resequence genomic subsets where crucial mutations are known to occur. In this article, we address this issue using DNA microarrays and inputs from hybridization thermodynamics. Hybridization signals from multiple probes are analysed, including strong signals from perfectly matching (PM) probes and a large amount of weaker cross-hybridization signals from mismatching (MM) probes. The latter are crucial in the data analysis. Seven coded clinical samples (HIV-1) are analyzed, and the microarray results are in full concordance with Sanger sequencing data. Moreover, the thermodynamic analysis of microarray signals resolves inherent ambiguities in Sanger data of mixed samples and provides additional clinically relevant information. These results show the reliability and added value of DNA microarrays for point-of-care diagnostic purposes.
Subject(s)
DNA Mutational Analysis/methods , HIV-1/genetics , Oligonucleotide Array Sequence Analysis/methods , Algorithms , HIV Reverse Transcriptase/genetics , Mutation , ThermodynamicsABSTRACT
Hybridization of nucleic acids on solid surfaces is a key process involved in high-throughput technologies such as microarrays and, in some cases, next-generation sequencing (NGS). A physical understanding of the hybridization process helps to determine the accuracy of these technologies. The goal of a widespread research program is to develop reliable transformations between the raw signals reported by the technologies and individual molecular concentrations from an ensemble of nucleic acids. This research has inputs from many areas, from bioinformatics and biostatistics, to theoretical and experimental biochemistry and biophysics, to computer simulations. A group of leading researchers met in Ploen Germany in 2011 to discuss present knowledge and limitations of our physico-chemical understanding of high-throughput nucleic acid technologies. This meeting inspired us to write this summary, which provides an overview of the state-of-the-art approaches based on physico-chemical foundation to modeling of the nucleic acids hybridization process on solid surfaces. In addition, practical application of current knowledge is emphasized.
Subject(s)
High-Throughput Nucleotide Sequencing , Oligonucleotide Array Sequence Analysis , Algorithms , Artifacts , Base Pairing , Calibration , DNA/chemistry , DNA/genetics , DNA Probes/chemistry , DNA Probes/genetics , Humans , Image Processing, Computer-Assisted , Models, Biological , Nucleic Acid Hybridization/methods , Surface Properties , ThermodynamicsABSTRACT
One of the challenges in multi-omics data analysis for precision medicine is the efficient exploration of undiscovered molecular interactions in disease processes. We present BioMOBS, a workflow consisting of two data visualization tools integrated with an open-source molecular information database to perform clinically relevant analyses (https://github.com/driesheylen123/BioMOBS). We performed exploratory pathway analysis with BioMOBS and demonstrate its ability to generate relevant molecular hypotheses, by reproducing recent findings in type 2 diabetes UK biobank data. The central visualisation tool, where data-driven and literature-based findings can be integrated, is available within the github link as well. BioMOBS is a workflow that leverages information from multiple data-driven interactive analyses and visually integrates it with established pathway knowledge. The demonstrated use cases place trust in the usage of BioMOBS as a procedure to offer clinically relevant insights in disease pathway analyses on various types of omics data.
Subject(s)
Diabetes Mellitus, Type 2 , Software , Humans , Multiomics , WorkflowABSTRACT
In high-throughput omics disciplines like transcriptomics, researchers face a need to assess the quality of an experiment prior to an in-depth statistical analysis. To efficiently analyze such voluminous collections of data, researchers need triage methods that are both quick and easy to use. Such a normalization method for relative quantitation, CONSTANd, was recently introduced for isobarically-labeled mass spectra in proteomics. It transforms the data matrix of abundances through an iterative, convergent process enforcing three constraints: (I) identical column sums; (II) each row sum is fixed (across matrices) and (III) identical to all other row sums. In this study, we investigate whether CONSTANd is suitable for count data from massively parallel sequencing, by qualitatively comparing its results to those of DESeq2. Further, we propose an adjustment of the method so that it may be applied to identically balanced but differently sized experiments for joint analysis. We find that CONSTANd can process large data sets at well over 1 million count records per second whilst mitigating unwanted systematic bias and thus quickly uncovering the underlying biological structure when combined with a PCA plot or hierarchical clustering. Moreover, it allows joint analysis of data sets obtained from different batches, with different protocols and from different labs but without exploiting information from the experimental setup other than the delineation of samples into identically processed sets (IPSs). CONSTANd's simplicity and applicability to proteomics as well as transcriptomics data make it an interesting candidate for integration in multi-omics workflows.