Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 15 de 15
Filtrar
Más filtros












Base de datos
Intervalo de año de publicación
1.
Anal Chem ; 96(23): 9343-9352, 2024 Jun 11.
Artículo en Inglés | MEDLINE | ID: mdl-38804718

RESUMEN

Oligonucleotide therapeutics have emerged as an important class of drugs offering targeted therapeutic strategies that complement traditional modalities, such as monoclonal antibodies and small molecules. Their unique ability to precisely modulate gene expression makes them vital for addressing previously undruggable targets. A critical aspect of developing these therapies is characterizing their molecular composition accurately. This includes determining the monoisotopic mass of oligonucleotides, which is essential for identifying impurities, degradants, and modifications that can affect the drug efficacy and safety. Mass spectrometry (MS) plays a pivotal role in this process, yet the accurate interpretation of complex mass spectra remains challenging, especially for large molecules, where the monoisotopic peak is often undetectable. To address this issue, we have adapted the MIND algorithm, originally developed for top-down proteomics, for use with oligonucleotide data. This adaptation allows for the prediction of monoisotopic mass from the more readily detectable, most-abundant peak mass, enhancing the ability to annotate complex spectra of oligonucleotides. Our comprehensive validation of this modified algorithm on both in silico and real-world oligonucleotide data sets has demonstrated its effectiveness and reliability. To facilitate wider adoption of this advanced analytical technique, we have encapsulated the enhanced MIND algorithm in a user-friendly Shiny application. This online platform simplifies the process of annotating complex oligonucleotide spectra, making advanced mass spectrometry analysis accessible to researchers and drug developers. The application is available at https://valkenborg-lab.shinyapps.io/mind4oligos/.


Asunto(s)
Algoritmos , Espectrometría de Masas , Oligonucleótidos , Oligonucleótidos/análisis , Espectrometría de Masas/métodos , Peso Molecular
3.
J Am Soc Mass Spectrom ; 33(11): 2063-2069, 2022 Nov 02.
Artículo en Inglés | MEDLINE | ID: mdl-36223196

RESUMEN

Nowadays, monoisotopic mass is used as an important feature in top-down proteomics. Knowing the exact monoisotopic mass is helpful for precise and quick protein identification in large protein databases. However, only in spectra of small molecules the monoisotopic peak is visible. For bigger molecules like proteins, it is hidden in noise or undetected at all, and therefore its position has to be predicted. By improving the prediction of the peak, we contribute to a more accurate identification of molecules, which is crucial in fields such as chemistry and medicine. In this work, we present the envemind algorithm, which is a two-step procedure to predict monoisotopic masses of proteins. The prediction is based on an isotopic envelope. Therefore, envemind is dedicated to spectra where we are able to resolve the one dalton separated isotopic variants. Furthermore, only single-molecule spectra are allowed, that is, spectra that do not require prior deconvolution. The algorithm deals with the problem of off-by-one dalton errors, which are common in monoisotopic mass prediction. A novel aspect of this work is a mathematical exploration of the space of molecules, where we equate chemical formulas and their theoretical spectrum. Since the space of molecules consists of all possible chemical formulas, this approach is not limited to known substances only. This makes optimization processes faster and enables to approximate theoretical spectrum for a given experimental one. The algorithm is available as a Python package envemind on our GitHub page https://github.com/PiotrRadzinski/envemind.


Asunto(s)
Proteínas , Proteómica , Bases de Datos de Proteínas , Proteínas/química , Proteómica/métodos , Algoritmos
4.
J Proteome Res ; 20(4): 2122-2129, 2021 04 02.
Artículo en Inglés | MEDLINE | ID: mdl-33724840

RESUMEN

The Bruker timsTOF Pro is an instrument that couples trapped ion mobility spectrometry (TIMS) to high-resolution time-of-flight (TOF) mass spectrometry (MS). For proteomics, lipidomics, and metabolomics applications, the instrument is typically interfaced with a liquid chromatography (LC) system. The resulting LC-TIMS-MS data sets are, in general, several gigabytes in size and are stored in the proprietary Bruker Tims data format (TDF). The raw data can be accessed using proprietary binaries in C, C++, and Python on Windows and Linux operating systems. Here we introduce a suite of computer programs for data accession, including OpenTIMS, TimsR, and TimsPy. OpenTIMS is a C++ library capable of reading Bruker TDF files. It opens up Bruker's proprietary codebase. TimsPy and TimsR build on top of OpenTIMS, enabling swift and user-friendly data access to the raw data with Python and R. Both programs are available under a GPL3 license on all major platforms, extending the possibility to interact with timsTOF data to macOS. Additionally, OpenTIMS is capable of translating Bruker data into HDF5 files that can be easily analyzed from Python with the vaex module. OpenTIMS and TimsPy therefore provide easy and quick access to Bruker timsTOF raw data.


Asunto(s)
Espectrometría de Movilidad Iónica , Proteómica , Cromatografía Liquida , Espectrometría de Masas , Programas Informáticos
5.
Rapid Commun Mass Spectrom ; : e8956, 2020 Sep 30.
Artículo en Inglés | MEDLINE | ID: mdl-32996651

RESUMEN

RATIONALE: The linear regression of mass spectra is a computational problem defined as fitting a linear combination of reference spectra to an experimental one. It is typically used to estimate the relative quantities of selected ions. In this work, we study this problem in an abstract setting to develop new approaches applicable to a diverse range of experiments. METHODS: To overcome the sensitivity of the ordinary least-squares regression to measurement inaccuracies, we base our methods on a non-conventional spectral dissimilarity measure, known as the Wasserstein or the Earth Mover's distance. This distance is based on the notion of the cost of transporting signal between mass spectra, which renders it naturally robust to measurement inaccuracies in the mass domain. RESULTS: Using a data set of 200 mass spectra, we show that our approach is capable of estimating ion proportions accurately without extensive preprocessing of spectra required by other methods. The conclusions are further substantiated using data sets simulated in a way that mimics most of the measurement inaccuracies occurring in real experiments. CONCLUSIONS: We have developed a linear regression algorithm based on the notion of the cost of transporting signal between spectra. Our implementation is available in a Python 3 package called masserstein, which is freely available at https://github.com/mciach/masserstein.

6.
Anal Chem ; 92(14): 9472-9475, 2020 07 21.
Artículo en Inglés | MEDLINE | ID: mdl-32501003

RESUMEN

High-resolution mass spectrometry becomes increasingly available with its ability to resolve the fine isotopic structure of measured analytes. It allows for high-sensitivity spectral deconvolution, leading to less false-positive identifications. Analytes can be identified by comparing their theoretical isotopic signal with the observed peaks. Necessary calculations are, however, computationally demanding and lead to long processing times. For wheat (trictum oestivum) alone, Uniprot holds more than 142 000 candidate protein sequences. This is doubled upon sequence reversal for identification FDR estimation and further multiplied by performing in silico digestion into peptides. The same peptide might originate from more than one protein, which reduces the overall number of sequences to be calculated. However, it is still huge. IsoSpec2 can perform these calculations fast. Compared to IsoSpec1, the algorithm is simpler, orders of magnitude faster, and offers more flexibility for the developers of algorithms for raw data analysis. It is freely available under a 2-clause BSD license, with bindings for the C++, C, R, and Python programming languages.

7.
BMC Bioinformatics ; 20(Suppl 15): 644, 2019 Dec 24.
Artículo en Inglés | MEDLINE | ID: mdl-31874610

RESUMEN

BACKGROUND: A survey of presences and absences of specific species across multiple biogeographic units (or bioregions) are used in a broad area of biological studies from ecology to microbiology. Using binary presence-absence data, we evaluate species co-occurrences that help elucidate relationships among organisms and environments. To summarize similarity between occurrences of species, we routinely use the Jaccard/Tanimoto coefficient, which is the ratio of their intersection to their union. It is natural, then, to identify statistically significant Jaccard/Tanimoto coefficients, which suggest non-random co-occurrences of species. However, statistical hypothesis testing using this similarity coefficient has been seldom used or studied. RESULTS: We introduce a hypothesis test for similarity for biological presence-absence data, using the Jaccard/Tanimoto coefficient. Several key improvements are presented including unbiased estimation of expectation and centered Jaccard/Tanimoto coefficients, that account for occurrence probabilities. The exact and asymptotic solutions are derived. To overcome a computational burden due to high-dimensionality, we propose the bootstrap and measurement concentration algorithms to efficiently estimate statistical significance of binary similarity. Comprehensive simulation studies demonstrate that our proposed methods produce accurate p-values and false discovery rates. The proposed estimation methods are orders of magnitude faster than the exact solution, particularly with an increasing dimensionality. We showcase their applications in evaluating co-occurrences of bird species in 28 islands of Vanuatu and fish species in 3347 freshwater habitats in France. The proposed methods are implemented in an open source R package called jaccard (https://cran.r-project.org/package=jaccard). CONCLUSION: We introduce a suite of statistical methods for the Jaccard/Tanimoto similarity coefficient for binary data, that enable straightforward incorporation of probabilistic measures in analysis for species co-occurrences. Due to their generality, the proposed methods and implementations are applicable to a wide range of binary data arising from genomics, biochemistry, and other areas of science.


Asunto(s)
Biología del Agua Dulce/métodos , Algoritmos , Animales , Biometría , Peces , Probabilidad
8.
Anal Chem ; 91(3): 1801-1807, 2019 02 05.
Artículo en Inglés | MEDLINE | ID: mdl-30608646

RESUMEN

Top-down mass spectrometry methods are becoming continuously more popular in the effort to describe the proteome. They rely on the fragmentation of intact protein ions inside the mass spectrometer. Among the existing fragmentation methods, electron transfer dissociation is known for its precision and wide coverage of different cleavage sites. However, several side reactions can occur under electron transfer dissociation (ETD) conditions, including nondissociative electron transfer and proton transfer reaction. Evaluating their extent can provide more insight into reaction kinetics as well as instrument operation. Furthermore, preferential formation of certain reaction products can reveal important structural information. To the best of our knowledge, there are currently no tools capable of tracing and analyzing the products of these reactions in a systematic way. In this Article, we present in detail masstodon: a computer program for assigning peaks and interpreting mass spectra. Besides being a general purpose tool, masstodon also offers the possibility to trace the products of reactions occurring under ETD conditions and provides insights into the parameters driving them. It is available free of charge under the GNU AGPL V3 public license.


Asunto(s)
Apolipoproteína A-I/análisis , Espectrometría de Masas/estadística & datos numéricos , Programas Informáticos , Sustancia P/análisis , Ubiquitina/análisis , Algoritmos , Electrones
9.
BMC Bioinformatics ; 18(Suppl 12): 422, 2017 Oct 16.
Artículo en Inglés | MEDLINE | ID: mdl-29072141

RESUMEN

BACKGROUND: The constant progress in sequencing technology leads to ever increasing amounts of genomic data. In the light of current evidence transposable elements (TEs for short) are becoming useful tools for learning about the evolution of host genome. Therefore the software for genome-wide detection and analysis of TEs is of great interest. RESULTS: Here we describe the computational tool for mining, classifying and storing TEs from newly sequenced genomes. This is an online, web-based, user-friendly service, enabling users to upload their own genomic data, and perform de-novo searches for TEs. The detected TEs are automatically analyzed, compared to reference databases, annotated, clustered into families, and stored in TEs repository. Also, the genome-wide nesting structure of found elements are detected and analyzed by new method for inferring evolutionary history of TEs. We illustrate the functionality of our tool by performing a full-scale analyses of TE landscape in Medicago truncatula genome. CONCLUSIONS: TRANScendence is an effective tool for the de-novo annotation and classification of transposable elements in newly-acquired genomes. Its streamlined interface makes it well-suited for evolutionary studies.


Asunto(s)
Elementos Transponibles de ADN/genética , Minería de Datos , Bases de Datos Genéticas , Programas Informáticos , Algoritmos , Animales , Drosophila melanogaster/genética , Genoma Humano , Humanos , Modelos Teóricos , Reproducibilidad de los Resultados
10.
Anal Chem ; 89(6): 3272-3277, 2017 03 21.
Artículo en Inglés | MEDLINE | ID: mdl-28234451

RESUMEN

As high-resolution mass spectrometry (HRMS) becomes increasingly available, the need of software tools capable of handling more complex data is surging. The complexity of the HRMS data stems partly from the presence of isotopes that give rise to more peaks to interpret compared to lower resolution instruments. However, a new generation of fine isotope calculators is on the rise. They calculate the smallest possible sets of isotopologues. However, none of these calculators lets the user specify the joint probability of the revealed envelope in advance. Instead, the user must provide a lower limit on the probability of isotopologues of interest, that is, provide minimal peak height. The choice of such threshold is far from obvious. In particular, it is impossible to a priori balance the trade-off between the algorithm speed and the portion of the revealed theoretical spectrum. We show that this leads to considerable inefficiencies. Here, we present IsoSpec: an algorithm for fast computation of isotopologues of chemical substances that can alternate between joint probability and peak height threshold. We prove that IsoSpec is optimal in terms of time complexity. Its implementation is freely available under a 2-clause BSD license, with bindings for C++, C, R, and Python.

11.
Angew Chem Int Ed Engl ; 55(20): 5904-37, 2016 05 10.
Artículo en Inglés | MEDLINE | ID: mdl-27062365

RESUMEN

Exactly half a century has passed since the launch of the first documented research project (1965 Dendral) on computer-assisted organic synthesis. Many more programs were created in the 1970s and 1980s but the enthusiasm of these pioneering days had largely dissipated by the 2000s, and the challenge of teaching the computer how to plan organic syntheses earned itself the reputation of a "mission impossible". This is quite curious given that, in the meantime, computers have "learned" many other skills that had been considered exclusive domains of human intellect and creativity-for example, machines can nowadays play chess better than human world champions and they can compose classical music pleasant to the human ear. Although there have been no similar feats in organic synthesis, this Review argues that to concede defeat would be premature. Indeed, bringing together the combination of modern computational power and algorithms from graph/network theory, chemical rules (with full stereo- and regiochemistry) coded in appropriate formats, and the elements of quantum mechanics, the machine can finally be "taught" how to plan syntheses of non-trivial organic molecules in a matter of seconds to minutes. The Review begins with an overview of some basic theoretical concepts essential for the big-data analysis of chemical syntheses. It progresses to the problem of optimizing pathways involving known reactions. It culminates with discussion of algorithms that allow for a completely de novo and fully automated design of syntheses leading to relatively complex targets, including those that have not been made before. Of course, there are still things to be improved, but computers are finally becoming relevant and helpful to the practice of organic-synthetic planning. Paraphrasing Churchill's famous words after the Allies' first major victory over the Axis forces in Africa, it is not the end, it is not even the beginning of the end, but it is the end of the beginning for the computer-assisted synthesis planning. The machine is here to stay.

12.
Genetica ; 143(4): 433-40, 2015 Aug.
Artículo en Inglés | MEDLINE | ID: mdl-25981486

RESUMEN

Transposable elements (TEs) are mobile DNA segments, abundant and dynamic in plant genomes. Because their mobility can be potentially deleterious to the host, a variety of mechanisms evolved limiting that negative impact, one of them being preference for a specific target insertion site. Here, we describe a family of Mutator-like DNA transposons in Medicago truncatula targeting TA microsatellites. We identified 218 copies of MuTAnTs and an element carrying a complete ORF encoding a mudrA-like transposase. Most insertion sites are flanked by a variable number of TA tandem repeats, indicating that MuTAnTs are specifically targeting TA microsatellites. Other TE families flanked by TA repeats (e.g. TAFT elements in maize) were described previously, however we identified the first putative autonomous element sharing that characteristics with a related group of short non-autonomous transposons.


Asunto(s)
Elementos Transponibles de ADN , Medicago truncatula/genética , Repeticiones de Microsatélite , Secuencia de Bases , Evolución Molecular , Orden Génico , Variación Genética , Datos de Secuencia Molecular , Mutagénesis Insercional , Filogenia , Alineación de Secuencia
13.
Nucleic Acids Res ; 43(4): 2188-98, 2015 Feb 27.
Artículo en Inglés | MEDLINE | ID: mdl-25613453

RESUMEN

Nonallelic homologous recombination (NAHR), occurring between low-copy repeats (LCRs) >10 kb in size and sharing >97% DNA sequence identity, is responsible for the majority of recurrent genomic rearrangements in the human genome. Recent studies have shown that transposable elements (TEs) can also mediate recurrent deletions and translocations, indicating the features of substrates that mediate NAHR may be significantly less stringent than previously believed. Using >4 kb length and >95% sequence identity criteria, we analyzed of the genome-wide distribution of long interspersed element (LINE) retrotransposon and their potential to mediate NAHR. We identified 17 005 directly oriented LINE pairs located <10 Mbp from each other as potential NAHR substrates, placing 82.8% of the human genome at risk of LINE-LINE-mediated instability. Cross-referencing these regions with CNVs in the Baylor College of Medicine clinical chromosomal microarray database of 36 285 patients, we identified 516 CNVs potentially mediated by LINEs. Using long-range PCR of five different genomic regions in a total of 44 patients, we confirmed that the CNV breakpoints in each patient map within the LINE elements. To additionally assess the scale of LINE-LINE/NAHR phenomenon in the human genome, we tested DNA samples from six healthy individuals on a custom aCGH microarray targeting LINE elements predicted to mediate CNVs and identified 25 LINE-LINE rearrangements. Our data indicate that LINE-LINE-mediated NAHR is widespread and under-recognized, and is an important mechanism of structural rearrangement contributing to human genomic variability.


Asunto(s)
Genoma Humano , Recombinación Homóloga , Elementos de Nucleótido Esparcido Largo , Algoritmos , Puntos de Rotura del Cromosoma , Hibridación Genómica Comparativa , Variaciones en el Número de Copia de ADN , Genómica/métodos , Humanos , Reacción en Cadena de la Polimerasa
14.
Chromosome Res ; 21(8): 781-8, 2013 Dec.
Artículo en Inglés | MEDLINE | ID: mdl-24254229

RESUMEN

Evolutionarily conserved transcription factor SOX9 is essential for the differentiation of chondrocytes and the development of testes. Heterozygous point mutations and genomic deletions involving SOX9 lead to campomelic dysplasia (CD), a skeletal malformation syndrome often associated with sex reversal. Chromosomal rearrangements with breakpoints mapping up to 1.6 Mb up- and downstream to SOX9, and likely disrupting its distant cis-regulatory elements, have been described in patients with milder forms of CD. Based on the location of these aberration breakpoints, four clusters upstream of SOX9 have been defined. Interestingly, we found that each of these intervals overlaps a gene encoding long noncoding RNA (lncRNA), suggesting that lncRNAs may contribute to long-range regulation of SOX9 expression. One of the four upstream regions, RevSex (517-595 kb 5' to SOX9), is associated with sex reversal, and was suggested to harbor a testis-specific and sex-determining enhancer. Another sex-determining interval was mapped to a gene desert >1.3 Mb downstream of SOX9. We have performed chromosome conformation capture-on-chip (4C) analysis in Sertoli cells and lymphoblasts to verify the proposed long-range interactions of the SOX9 promoter and to identify potential novel regulatory elements that might be responsible for sex reversal in patients with CD. We identified several novel potentially cis-interacting regions both up- and downstream to SOX9, with some of them overlapping lncRNA genes. Our data point to lncRNAs as likely mediators of some of these regulatory interactions.


Asunto(s)
Cromosomas Humanos/genética , Regiones Promotoras Genéticas , Factor de Transcripción SOX9/genética , Displasia Campomélica/genética , Línea Celular , Aberraciones Cromosómicas , Deleción Cromosómica , Biología Computacional , Regulación de la Expresión Génica , Humanos , Masculino , Procedimientos Analíticos en Microchip , Familia de Multigenes , ARN Largo no Codificante/genética , ARN Largo no Codificante/metabolismo , Factor de Transcripción SOX9/metabolismo , Células de Sertoli/metabolismo
15.
Theor Popul Biol ; 90: 145-51, 2013 Dec.
Artículo en Inglés | MEDLINE | ID: mdl-23948701

RESUMEN

Transposable elements are DNA segments capable of persisting in host genomes by self-replication in spite of deleterious mutagenic effects. The theoretical dynamics of these elements within genomes has been studied extensively, and population genetic models predict that they can invade and maintain as a result of both intra-genomic and inter-individual selection in sexual species. In asexuals, the success of selfish DNA is more difficult to explain. However, most theoretical work assumes constant environment. Here, we analyze the impact of environmental change on the dynamics of transposition activity when horizontal DNA exchange is absent, based on a stochastic computational model of transposable element proliferation. We argue that repeated changes in the phenotypic optimum in a multidimensional fitness landscape may induce explosive bursts of transposition activity associated with faster adaptation. However, long-term maintenance of transposition activity is unlikely. This could contribute to the significant variation in the transposable element copy number among closely related species.


Asunto(s)
Interacciones Huésped-Parásitos , Parásitos/genética , Simbiosis , Animales , Elementos Transponibles de ADN , Mutación , Selección Genética
SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA
...