RESUMO
BACKGROUND: Geometry calibration for robotic CT system is necessary for obtaining acceptable images under the asynchrony of two manipulators. OBJECTIVE: We aim to evaluate the impact of different types of asynchrony on images and propose a reference-free calibration method based on a simplified geometry model. METHODS: We evaluate the impact of different types of asynchrony on images and propose a novel calibration method focused on asynchronous rotation of robotic CT. The proposed method is initialized with reconstructions under default uncalibrated geometry and uses grid sampling of estimated geometry to determine the direction of optimization. Difference between the re-projections of sampling points and the original projection is used to guide the optimization direction. Images and estimated geometry are optimized alternatively in an iteration, and it stops when the difference of residual projections is close enough, or when the maximum iteration number is reached. RESULTS: In our simulation experiments, proposed method shows better performance, with the PSNR increasing by 2%, and the SSIM increasing by 13.6% after calibration. The experiments reveal fewer artifacts and higher image quality. CONCLUSION: We find that asynchronous rotation has a more significant impact on reconstruction, and the proposed method offers a feasible solution for correcting asynchronous rotation.
Assuntos
Imagens de Fantasmas , Robótica , Tomografia Computadorizada por Raios X , Calibragem , Tomografia Computadorizada por Raios X/métodos , Rotação , Processamento de Imagem Assistida por Computador/métodos , Algoritmos , Humanos , Simulação por ComputadorRESUMO
BACKGROUND: Cancerous cells' identity is determined via a mixture of multiple factors such as genomic variations, epigenetics, and the regulatory variations that are involved in transcription. The differences in transcriptome expression as well as abnormal structures in peptides determine phenotypical differences. Thus, bulk RNA-seq and more recent single-cell RNA-seq data (scRNA-seq) are important to identify pathogenic differences. In this case, we rely on k-mer decomposition of sequences to identify pathogenic variations in detail which does not need a reference, so it outperforms more traditional Next-Generation Sequencing (NGS) analysis techniques depending on the alignment of the sequences to a reference. RESULTS: Via our alignment-free analysis, over esophageal and glioblastoma cancer patients, high-frequency variations over multiple different locations (repeats, intergenic regions, exons, introns) as well as multiple different forms (fusion, polyadenylation, splicing, etc.) could be discovered. Additionally, we have analyzed the importance of less-focused events systematically in a classic transcriptome analysis pipeline where these events are considered as indicators for tumor prognosis, tumor prediction, tumor neoantigen inference, as well as their connection with respect to the immune microenvironment. CONCLUSIONS: Our results suggest that esophageal cancer (ESCA) and glioblastoma processes can be explained via pathogenic microbial RNA, repeated sequences, novel splicing variants, and long intergenic non-coding RNAs (lincRNAs). We expect our application of reference-free process and analysis to be helpful in tumor and normal samples differential scRNA-seq analysis, which in turn offers a more comprehensive scheme for major cancer-associated events.
Assuntos
Glioblastoma , Análise de Célula Única , Transcriptoma , Humanos , Análise de Célula Única/métodos , Glioblastoma/genética , Glioblastoma/patologia , Perfilação da Expressão Gênica/métodos , Neoplasias Esofágicas/genética , Neoplasias Esofágicas/patologia , Sequenciamento de Nucleotídeos em Larga Escala , RNA-Seq/métodos , Análise de Sequência de RNA/métodos , Regulação Neoplásica da Expressão Gênica , Neoplasias/genética , Neoplasias/patologiaRESUMO
A protein's genetic architecture - the set of causal rules by which its sequence produces its functions - also determines its possible evolutionary trajectories. Prior research has proposed that the genetic architecture of proteins is very complex, with pervasive epistatic interactions that constrain evolution and make function difficult to predict from sequence. Most of this work has analyzed only the direct paths between two proteins of interest - excluding the vast majority of possible genotypes and evolutionary trajectories - and has considered only a single protein function, leaving unaddressed the genetic architecture of functional specificity and its impact on the evolution of new functions. Here, we develop a new method based on ordinal logistic regression to directly characterize the global genetic determinants of multiple protein functions from 20-state combinatorial deep mutational scanning (DMS) experiments. We use it to dissect the genetic architecture and evolution of a transcription factor's specificity for DNA, using data from a combinatorial DMS of an ancient steroid hormone receptor's capacity to activate transcription from two biologically relevant DNA elements. We show that the genetic architecture of DNA recognition consists of a dense set of main and pairwise effects that involve virtually every possible amino acid state in the protein-DNA interface, but higher-order epistasis plays only a tiny role. Pairwise interactions enlarge the set of functional sequences and are the primary determinants of specificity for different DNA elements. They also massively expand the number of opportunities for single-residue mutations to switch specificity from one DNA target to another. By bringing variants with different functions close together in sequence space, pairwise epistasis therefore facilitates rather than constrains the evolution of new functions.
Assuntos
Epistasia Genética , Evolução Molecular , Fatores de Transcrição/metabolismo , Fatores de Transcrição/genética , DNA/genética , DNA/metabolismo , Mutação , Ligação ProteicaRESUMO
Integrating multiple single-cell datasets is essential for the comprehensive understanding of cell heterogeneity. Batch effect is the undesired systematic variations among technologies or experimental laboratories that distort biological signals and hinder the integration of single-cell datasets. However, existing methods typically rely on a selected dataset as a reference, leading to inconsistent integration performance using different references, or embed cells into uninterpretable low-dimensional feature space. To overcome these limitations, a reference-free method, Beaconet, for integrating multiple single-cell transcriptomic datasets in original molecular space by aligning the global distribution of each batch using an adversarial correction network is presented. Through extensive comparisons with 13 state-of-the-art methods, it is demonstrated that Beaconet can effectively remove batch effect while preserving biological variations and is superior to existing unsupervised methods using all possible references in overall performance. Furthermore, Beaconet performs integration in the original molecular feature space, enabling the characterization of cell types and downstream differential expression analysis directly using integrated data with gene-expression features. Additionally, when applying to large-scale atlas data integration, Beaconet shows notable advantages in both time- and space-efficiencies. In summary, Beaconet serves as an effective and efficient batch effect removal tool that can facilitate the integration of single-cell datasets in a reference-free and molecular feature-preserved mode.
Assuntos
Perfilação da Expressão Gênica , Análise de Célula Única , Transcriptoma , Análise de Célula Única/métodos , Transcriptoma/genética , Perfilação da Expressão Gênica/métodos , Humanos , Biologia Computacional/métodos , AnimaisRESUMO
Spatially resolved x-ray fluorescence (XRF) based analysis employing incident beam sizes in the low micrometer range (µXRF) is widely used to study lateral composition changes of various types of microstructured samples. However, up to now the quantitative analysis of such experimental datasets could only be realized employing adequate calibration or reference specimen. In this work, we extent the applicability of the so-called reference-free XRF approach to enable reference-freeµXRF analysis. Here, no calibration specimen are needed in order to derive a quantitative and position sensitive composition of the sample of interest. The necessary instrumental steps to realize reference-freeµXRF are explained and a validation of ref.-freeµXRF against ref.-free standard XRF is performed employing laterally homogeneous samples. Finally, an application example from semiconductor research is shown, where the lateral sample features require the usage of ref.-freeµXRF for quantitative analysis.
RESUMO
The opioid crisis in the United States is being fueled by the rapid emergence of new fentanyl analogs and precursors that can elude traditional library-based screening methods, which require data from known reference compounds. Since reference compounds are unavailable for new fentanyl analogs, we examined if fentanyls (fentanyl + fentanyl analogs) could be identified in a reference-free manner using a combination of electrospray ionization (ESI), high-resolution ion mobility (IM) spectrometry, high-resolution mass spectrometry (MS), and higher-energy collision-induced dissociation (MS/MS). We analyzed a mixture containing nine fentanyls and W-15 (a structurally similar molecule) and found that the protonated forms of all fentanyls exhibited two baseline-separated IM distributions that produced different MS/MS patterns. Upon fragmentation, both IM distributions of all fentanyls produced two high intensity fragments, resulting from amine site cleavages. The higher mobility distributions of all fentanyls also produced several low intensity fragments, but surprisingly, these same fragments exhibited much greater intensities in the lower mobility distributions. This observation demonstrates that many fragments of fentanyls predominantly originate from one of two different gas-phase structures (suggestive of protomers). Furthermore, increasing the water concentration in the ESI solution increased the intensity of the lower mobility distribution relative to the higher mobility distribution, which further supports that fentanyls exist as two gas-phase protomers. Our observations on the IM and MS/MS properties of fentanyls can be exploited to positively differentiate fentanyls from other compounds without requiring reference libraries and will hopefully assist first responders and law enforcement in combating new and emerging fentanyls.
Assuntos
Fentanila , Espectrometria de Massas em Tandem , Humanos , Espectrometria de Massas em Tandem/métodos , Subunidades Proteicas , Espectrometria de Mobilidade Iônica/métodosRESUMO
Bridges are designed and built to be safe against failure and perform satisfactorily over their service life. Bridge structural health monitoring (BSHM) systems are therefore essential to ensure the safety and serviceability of such critical transportation infrastructure. Identification of structural damage at the earliest time possible is a major goal of BSHM processes. Among many developed damage identification techniques (DITs), vibration-based techniques have shown great potential to be implemented in BSHM systems. In a vibration-based DIT, the response of a bridge is measured and analyzed in either time or space domain for the purpose of detecting damage-induced changes in the extracted dynamic properties of the bridge. This approach usually requires a comparison between two structural states of the bridge-the current state and a reference (intact/undamaged) state. In most in-situ cases, however, data on the bridge structural response in the reference state are not available. Therefore, researchers have been recently working on the development of DITs that eliminate the need for a prior knowledge of the reference state. This paper thoroughly explains why and how the reference state can be excluded from the damage identification process. It then reviews the state-of-the-art reference-free vibration-based DITs and summarizes their merits and shortcomings to give guidance on their applicability to BSHM systems. Finally, some recommendations are given for further research.
RESUMO
Increasing evidence suggests that microbial species have a strong within species genetic heterogeneity. This can be problematic for the analysis of prokaryote genomes, which commonly relies on a reference genome to guide the assembly process. Differences between reference and sample genomes will therefore introduce errors in final assembly, jeopardizing the detection from structural variations to point mutations-critical for genomic surveillance of antibiotic resistance. Here we present Hound, a pipeline that integrates publicly available tools to assemble prokaryote genomes de novo, detect user-given genes by similarity to report mutations found in the coding sequence, promoter, as well as relative gene copy number within the assembly. Importantly, Hound can use the query sequence as a guide to merge contigs, and reconstruct genes that were fragmented by the assembler. To showcase Hound, we screened through 5032 bacterial whole-genome sequences isolated from farmed animals and human infections, using the amino acid sequence encoded by blaTEM-1, to detect and predict resistance to amoxicillin/clavulanate which is driven by over-expression of this gene. We believe this tool can facilitate the analysis of prokaryote species that currently lack a reference genome, and can be scaled either up to build automated systems for genomic surveillance or down to integrate into antibiotic susceptibility point-of-care diagnostics.
Assuntos
Genoma Bacteriano , Genômica , Animais , Humanos , Genótipo , Fenótipo , Dosagem de GenesRESUMO
BACKGROUND: Transcriptomic studies involving organisms for which reference genomes are not available typically start by generating de novo transcriptome or supertranscriptome assembly from the raw RNA-seq reads. Assembling a supertranscriptome is, however, a challenging task due to significantly varying abundance of mRNA transcripts, alternative splicing, and sequencing errors. As a result, popular de novo supertranscriptome assembly tools generate assemblies containing contigs that are partially-assembled, fragmented, false chimeras or have local mis-assemblies leading to decreased assembly accuracy. Commonly available tools for assembly improvement rely primarily on running BLAST using closely related species making their accuracy and reliability conditioned on the availability of the data for closely related organisms. RESULTS: We present ROAST, a tool for optimization of supertranscriptome assemblies that uses paired-end RNA-seq data from Illumina sequencing platform to iteratively identify and fix assembly errors solely using the error signatures generated by RNA-seq alignment tools including soft-clips, unexpected expression coverage, and reads with mates unmapped or mapped on a different contig to identify and fix various supertranscriptome assembly errors without performing BLAST searches against other organisms. Evaluation results using simulated as well as real datasets show that ROAST significantly improves assembly quality by identifying and fixing various assembly errors. CONCLUSION: ROAST provides a reference-free approach to optimizing supertranscriptome assemblies highlighting its utility in refining de novo supertranscriptome assemblies of non-model organisms.
Assuntos
Perfilação da Expressão Gênica , Transcriptoma , Reprodutibilidade dos Testes , Perfilação da Expressão Gênica/métodos , Genoma , Sequenciamento de Nucleotídeos em Larga Escala/métodosRESUMO
How complicated is the genetic architecture of proteins - the set of causal effects by which sequence determines function? High-order epistatic interactions among residues are thought to be pervasive, making a protein's function difficult to predict or understand from its sequence. Most studies, however, used methods that overestimate epistasis, because they analyze genetic architecture relative to a designated reference sequence - causing measurement noise and small local idiosyncrasies to propagate into pervasive high-order interactions - or have not effectively accounted for global nonlinearity in the sequence-function relationship. Here we present a new reference-free method that jointly estimates global nonlinearity and specific epistatic interactions across a protein's entire genotype-phenotype map. This method yields a maximally efficient explanation of a protein's genetic architecture and is more robust than existing methods to measurement noise, partial sampling, and model misspecification. We reanalyze 20 combinatorial mutagenesis experiments from a diverse set of proteins and find that additive and pairwise effects, along with a simple nonlinearity to account for limited dynamic range, explain a median of 96% of total variance in measured phenotypes (and >92% in every case). Only a tiny fraction of genotypes are strongly affected by third- or higher-order epistasis. Genetic architecture is also sparse: the number of terms required to explain the vast majority of variance is smaller than the number of genotypes by many orders of magnitude. The sequence-function relationship in most proteins is therefore far simpler than previously thought, opening the way for new and tractable approaches to characterize it.
RESUMO
Today's genomics workflows typically require alignment to a reference sequence, which limits discovery. We introduce a unifying paradigm, SPLASH (Statistically Primary aLignment Agnostic Sequence Homing), which directly analyzes raw sequencing data, using a statistical test to detect a signature of regulation: sample-specific sequence variation. SPLASH detects many types of variation and can be efficiently run at scale. We show that SPLASH identifies complex mutation patterns in SARS-CoV-2, discovers regulated RNA isoforms at the single-cell level, detects the vast sequence diversity of adaptive immune receptors, and uncovers biology in non-model organisms undocumented in their reference genomes: geographic and seasonal variation and diatom association in eelgrass, an oceanic plant impacted by climate change, and tissue-specific transcripts in octopus. SPLASH is a unifying approach to genomic analysis that enables expansive discovery without metadata or references.
Assuntos
Algoritmos , Genômica , Genoma , Análise de Sequência de RNA , Humanos , Antígenos HLA/genética , Análise de Célula ÚnicaRESUMO
Alternative splicing (AS) is an essential post-transcriptional mechanism that regulates many biological processes. However, identifying comprehensive types of AS events without guidance from a reference genome is still a challenge. Here, we proposed a novel method, MkcDBGAS, to identify all seven types of AS events using transcriptome alone, without a reference genome. MkcDBGAS, modeled by full-length transcripts of human and Arabidopsis thaliana, consists of three modules. In the first module, MkcDBGAS, for the first time, uses a colored de Bruijn graph with dynamic- and mixed- kmers to identify bubbles generated by AS with precision higher than 98.17% and detect AS types overlooked by other tools. In the second module, to further classify types of AS, MkcDBGAS added the motifs of exons to construct the feature matrix followed by the XGBoost-based classifier with the accuracy of classification greater than 93.40%, which outperformed other widely used machine learning models and the state-of-the-art methods. Highly scalable, MkcDBGAS performed well when applied to Iso-Seq data of Amborella and transcriptome of mouse. In the third module, MkcDBGAS provides the analysis of differential splicing across multiple biological conditions when RNA-sequencing data is available. MkcDBGAS is the first accurate and scalable method for detecting all seven types of AS events using the transcriptome alone, which will greatly empower the studies of AS in a wider field.
Assuntos
Processamento Alternativo , Arabidopsis , Animais , Humanos , Camundongos , Transcriptoma , Splicing de RNA , Análise de Sequência de RNA/métodos , RNA , Arabidopsis/genética , Perfilação da Expressão Gênica/métodosRESUMO
BACKGROUND: Producing animal protein while reducing the animal's impact on the environment, e.g., through improved feed efficiency and lowered methane emissions, has gained interest in recent years. Genetic selection is one possible path to reduce the environmental impact of livestock production, but these traits are difficult and expensive to measure on many animals. The rumen microbiome may serve as a proxy for these traits due to its role in feed digestion. Restriction enzyme-reduced representation sequencing (RE-RRS) is a high-throughput and cost-effective approach to rumen metagenome profiling, but the systematic (e.g., sequencing) and biological factors influencing the resulting reference based (RB) and reference free (RF) profiles need to be explored before widespread industry adoption is possible. RESULTS: Metagenome profiles were generated by RE-RRS of 4,479 rumen samples collected from 1,708 sheep, and assigned to eight groups based on diet, age, time off feed, and country (New Zealand or Australia) at the time of sample collection. Systematic effects were found to have minimal influence on metagenome profiles. Diet was a major driver of differences between samples, followed by time off feed, then age of the sheep. The RF approach resulted in more reads being assigned per sample and afforded greater resolution when distinguishing between groups than the RB approach. Normalizing relative abundances within the sampling Cohort abolished structures related to age, diet, and time off feed, allowing a clear signal based on methane emissions to be elucidated. Genus-level abundances of rumen microbes showed low-to-moderate heritability and repeatability and were consistent between diets. CONCLUSIONS: Variation in rumen metagenomic profiles was influenced by diet, age, time off feed and genetics. Not accounting for environmental factors may limit the ability to associate the profile with traits of interest. However, these differences can be accounted for by adjusting for Cohort effects, revealing robust biological signals. The abundances of some genera were consistently heritable and repeatable across different environments, suggesting that metagenomic profiles could be used to predict an individual's future performance, or performance of its offspring, in a range of environments. These results highlight the potential of using rumen metagenomic profiles for selection purposes in a practical, agricultural setting.
Assuntos
Metagenoma , Microbiota , Animais , Ovinos/genética , Rúmen , Gado , MetanoRESUMO
BACKGROUND: Molecular phylogenetics studies the evolutionary relationships among the individuals of a population through their biological sequences. It may provide insights about the origin and the evolution of viral diseases, or highlight complex evolutionary trajectories. A key task is inferring phylogenetic trees from any type of sequencing data, including raw short reads. Yet, several tools require pre-processed input data e.g. from complex computational pipelines based on de novo assembly or from mappings against a reference genome. As sequencing technologies keep becoming cheaper, this puts increasing pressure on designing methods that perform analysis directly on their outputs. From this viewpoint, there is a growing interest in alignment-, assembly-, and reference-free methods that could work on several data including raw reads data. RESULTS: We present phyBWT2, a newly improved version of phyBWT (Guerrini et al. in 22nd International Workshop on Algorithms in Bioinformatics (WABI) 242:23-12319, 2022). Both of them directly reconstruct phylogenetic trees bypassing both the alignment against a reference genome and de novo assembly. They exploit the combinatorial properties of the extended Burrows-Wheeler Transform (eBWT) and the corresponding eBWT positional clustering framework to detect relevant blocks of the longest shared substrings of varying length (unlike the k-mer-based approaches that need to fix the length k a priori). As a result, they provide novel alignment-, assembly-, and reference-free methods that build partition trees without relying on the pairwise comparison of sequences, thus avoiding to use a distance matrix to infer phylogeny. In addition, phyBWT2 outperforms phyBWT in terms of running time, as the former reconstructs phylogenetic trees step-by-step by considering multiple partitions, instead of just one partition at a time, as previously done by the latter. CONCLUSIONS: Based on the results of the experiments on sequencing data, we conclude that our method can produce trees of quality comparable to the benchmark phylogeny by handling datasets of different types (short reads, contigs, or entire genomes). Overall, the experiments confirm the effectiveness of phyBWT2 that improves the performance of its previous version phyBWT, while preserving the accuracy of the results.
RESUMO
The detection and quantification of transposable elements (TE) are notoriously challenging despite their relevance in evolutionary genomics and molecular ecology. The main hurdle is caused by the dependence of numerous tools on genome assemblies, whose level of completion directly affects the comparability of the results across species or populations. dnaPipeTE, whose use is demonstrated here, tackles this issue by directly performing TE detection, classification, and quantification from unassembled short reads. This chapter details all the required steps to perform a comparative analysis of the TE content between two related species, starting from the installation of a recently containerized version of the program to the post-processing of the outputs.
Assuntos
Evolução Biológica , Elementos de DNA Transponíveis , Elementos de DNA Transponíveis/genética , Ecologia , GenômicaRESUMO
Complete comprehension of clinically relevant variation among human genomes is likely only to come from sequencing platforms that are cost-efficient, and which feature both accurate base calling and long-range DNA phasing capability. The NGS revolution has struggled to meet the latter of these needs. Here we describe a protocol to address this limitation by preserving the molecular origin of short sequencing reads with an insignificant increase to sequencing costs. Whole haplotype-resolved genomes with megabase-scale phase blocks can be obtained with this method; offering researchers a unique opportunity to tackle the hurdles of de novo sequencing without being limited by a lack of resources.
Assuntos
Genoma Humano , Sequenciamento de Nucleotídeos em Larga Escala , Humanos , Análise de Sequência de DNA/métodos , Sequenciamento de Nucleotídeos em Larga Escala/métodos , HaplótiposRESUMO
Question Generation (QG) aims to automate the task of composing questions for a passage with a set of chosen answers found within the passage. In recent years, the introduction of neural generation models has resulted in substantial improvements of automatically generated questions in terms of quality, especially compared to traditional approaches that employ manually crafted heuristics. However, current QG evaluation metrics solely rely on the comparison between the generated questions and references, ignoring the passages or answers. Meanwhile, these metrics are generally criticized because of their low agreement with human judgement. We therefore propose a new reference-free evaluation metric called QAScore, which is capable of providing a better mechanism for evaluating QG systems. QAScore evaluates a question by computing the cross entropy according to the probability that the language model can correctly generate the masked words in the answer to that question. Compared to existing metrics such as BLEU and BERTScore, QAScore can obtain a stronger correlation with human judgement according to our human evaluation experiment, meaning that applying QAScore in the QG task benefits to a higher level of evaluation accuracy.
RESUMO
RNA is a unique biomolecule that is involved in a variety of fundamental biological functions, all of which depend solely on its structure and dynamics. Since the experimental determination of crystal RNA structures is laborious, computational 3D structure prediction methods are experiencing an ongoing and thriving development. Such methods can lead to many models; thus, it is necessary to build comparisons and extract common structural motifs for further medical or biological studies. Here, we introduce a computational pipeline dedicated to reference-free high-throughput comparative analysis of 3D RNA structures. We show its application in the RNA-Puzzles challenge, in which five participating groups attempted to predict the three-dimensional structures of 5'- and 3'-untranslated regions (UTRs) of the SARS-CoV-2 genome. We report the results of this puzzle and discuss the structural motifs obtained from the analysis. All simulated models and tools incorporated into the pipeline are open to scientific and academic use.
Assuntos
COVID-19 , RNA , Regiões 3' não Traduzidas , Humanos , Conformação de Ácido Nucleico , RNA/química , SARS-CoV-2RESUMO
DNA sequencing technologies enable the generation of genetic profiles from many individuals at a rapid rate. Identifying single-nucleotide polymorphism (SNP) between biological samples is fundamental in genetics with various applications, such as disease diagnosis and associations and ancestry and relationship inference. Most methods use a species-specific reference genome for aligning raw sequenced reads for accurate SNP calling. However, high-quality reference genomes may not be available for all species. Therefore, we developed a reference-free algorithm, Kmer2SNP, to identify heterozygous SNPs from raw sequenced reads to facilitate genetic studies in species without the reference genome. Kmer2SNP first calculates the k-mer frequency distribution from reads to determine k-mers containing heterozygous SNPs. Next, these k-mers are rapidly matched with each other to identify pairs of exact heterozygous k-mers that belong to one of the two possible haplotypes in a diploid organism. Finally, using overlapping neighboring k-mers, weights are assigned for SNP assignments; higher weights increase SNP discovery confidence.
Assuntos
Algoritmos , Polimorfismo de Nucleotídeo Único , Diploide , Heterozigoto , Sequenciamento de Nucleotídeos em Larga Escala , Humanos , Análise de Sequência de DNA/métodos , SoftwareRESUMO
Adjusting cell type composition is challenging but critical in epigenome-wide association studies (EWAS). In this chapter, we describe how to apply reference-based and reference-free methods in R to impute cell type composition in whole blood samples.