Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 14 de 14
Filtrar
Más filtros










Base de datos
Intervalo de año de publicación
1.
Cell Rep Methods ; 3(8): 100543, 2023 08 28.
Artículo en Inglés | MEDLINE | ID: mdl-37671027

RESUMEN

The human pangenome, a new reference sequence, addresses many limitations of the current GRCh38 reference. The first release is based on 94 high-quality haploid assemblies from individuals with diverse backgrounds. We employed a k-mer indexing strategy for comparative analysis across multiple assemblies, including the pangenome reference, GRCh38, and CHM13, a telomere-to-telomere reference assembly. Our k-mer indexing approach enabled us to identify a valuable collection of universally conserved sequences across all assemblies, referred to as "pan-conserved segment tags" (PSTs). By examining intervals between these segments, we discerned highly conserved genomic segments and those with structurally related polymorphisms. We found 60,764 polymorphic intervals with unique geo-ethnic features in the pangenome reference. In this study, we utilized ultra-conserved sequences (PSTs) to forge a link between human pangenome assemblies and reference genomes. This methodology enables the examination of any sequence of interest within the pangenome, using the reference genome as a comparative framework.


Asunto(s)
Neoplasias de Células Escamosas , Neoplasias Cutáneas , Humanos , Secuencia Conservada , Haploidia , Polimorfismo Genético
2.
Front Oncol ; 13: 1310054, 2023.
Artículo en Inglés | MEDLINE | ID: mdl-38304032

RESUMEN

Background: Colon microbiome composition contributes to the pathogenesis of colorectal cancer (CRC) and prognosis. We analyzed 16S rRNA sequencing data from tumor samples of patients with metastatic CRC and determined the clinical implications. Materials and methods: We enrolled 133 patients with metastatic CRC at St. Vincent Hospital in Korea. The V3-V4 regions of the 16S rRNA gene from the tumor DNA were amplified, sequenced on an Illumina MiSeq, and analyzed using the DADA2 package. Results: After excluding samples that retained <5% of the total reads after merging, 120 samples were analyzed. The median age of patients was 63 years (range, 34-82 years), and 76 patients (63.3%) were male. The primary cancer sites were the right colon (27.5%), left colon (30.8%), and rectum (41.7%). All subjects received 5-fluouracil-based systemic chemotherapy. After removing genera with <1% of the total reads in each patient, 523 genera were identified. Rectal origin, high CEA level (≥10 ng/mL), and presence of lung metastasis showed higher richness. Survival analysis revealed that the presence of Prevotella (p = 0.052), Fusobacterium (p = 0.002), Selenomonas (p<0.001), Fretibacterium (p = 0.001), Porphyromonas (p = 0.007), Peptostreptococcus (p = 0.002), and Leptotrichia (p = 0.003) were associated with short overall survival (OS, <24 months), while the presence of Sphingomonas was associated with long OS (p = 0.070). From the multivariate analysis, the presence of Selenomonas (hazard ratio [HR], 6.35; 95% confidence interval [CI], 2.38-16.97; p<0.001) was associated with poor prognosis along with high CEA level. Conclusion: Tumor microbiome features may be useful prognostic biomarkers for metastatic CRC.

3.
Nucleic Acids Res ; 50(W1): W448-W453, 2022 07 05.
Artículo en Inglés | MEDLINE | ID: mdl-35474383

RESUMEN

K-mers are short DNA sequences that are used for genome sequence analysis. Applications that use k-mers include genome assembly and alignment. However, the wider bioinformatic use of these short sequences has challenges related to the massive scale of genomic sequence data. A single human genome assembly has billions of k-mers. As a result, the computational requirements for analyzing k-mer information is enormous, particularly when involving complete genome assemblies. To address these issues, we developed a new indexing data structure based on a hash table tuned for the lookup of short sequence keys. This web application, referred to as KmerKeys, provides performant, rapid query speeds for cloud computation on genome assemblies. We enable fuzzy as well as exact sequence searches of assemblies. To enable robust and speedy performance, the website implements cache-friendly hash tables, memory mapping and massive parallel processing. Our method employs a scalable and efficient data structure that can be used to jointly index and search a large collection of human genome assembly information. One can include variant databases and their associated metadata such as the gnomAD population variant catalogue. This feature enables the incorporation of future genomic information into sequencing analysis. KmerKeys is freely accessible at https://kmerkeys.dgi-stanford.org.


Asunto(s)
Algoritmos , Análisis de Secuencia de ADN , Programas Informáticos , Humanos , Genoma Humano , Genómica/métodos , Análisis de Secuencia de ADN/métodos
4.
NAR Cancer ; 2(4): zcaa034, 2020 Dec.
Artículo en Inglés | MEDLINE | ID: mdl-33345188

RESUMEN

Cancer genome sequencing has led to important discoveries such as the identification of cancer genes. However, challenges remain in the analysis of cancer genome sequencing. One significant issue is that mutations identified by multiple variant callers are frequently discordant even when using the same genome sequencing data. For insertion and deletion mutations, oftentimes there is no agreement among different callers. Identifying somatic mutations involves read mapping and variant calling, a complicated process that uses many parameters and model tuning. To validate the identification of true mutations, we developed a method using k-mer sequences. First, we characterized the landscape of unique versus non-unique k-mers in the human genome. Second, we developed a software package, KmerVC, to validate the given somatic mutations from sequencing data. Our program validates the occurrence of a mutation based on statistically significant difference in frequency of k-mers with and without a mutation from matched normal and tumor sequences. Third, we tested our method on both simulated and cancer genome sequencing data. Counting k-mer involving mutations effectively validated true positive mutations including insertions and deletions across different individual samples in a reproducible manner. Thus, we demonstrated a straightforward approach for rapidly validating mutations from cancer genome sequencing data.

6.
Proc Natl Acad Sci U S A ; 113(34): E4956-65, 2016 08 23.
Artículo en Inglés | MEDLINE | ID: mdl-27493222

RESUMEN

The past decade has seen a wealth of 3D structural information about complex structured RNAs and identification of functional intermediates. Nevertheless, developing a complete and predictive understanding of the folding and function of these RNAs in biology will require connection of individual rate and equilibrium constants to structural changes that occur in individual folding steps and further relating these steps to the properties and behavior of isolated, simplified systems. To accomplish these goals we used the considerable structural knowledge of the folded, unfolded, and intermediate states of P4-P6 RNA. We enumerated structural states and possible folding transitions and determined rate and equilibrium constants for the transitions between these states using single-molecule FRET with a series of mutant P4-P6 variants. Comparisons with simplified constructs containing an isolated tertiary contact suggest that a given tertiary interaction has a stereotyped rate for breaking that may help identify structural transitions within complex RNAs and simplify the prediction of folding kinetics and thermodynamics for structured RNAs from their parts. The preferred folding pathway involves initial formation of the proximal tertiary contact. However, this preference was only ∼10 fold and could be reversed by a single point mutation, indicating that a model akin to a protein-folding contact order model will not suffice to describe RNA folding. Instead, our results suggest a strong analogy with a modified RNA diffusion-collision model in which tertiary elements within preformed secondary structures collide, with the success of these collisions dependent on whether the tertiary elements are in their rare binding-competent conformations.


Asunto(s)
Motivos de Nucleótidos , Mutación Puntual , ARN/química , Emparejamiento Base , Transferencia Resonante de Energía de Fluorescencia , Cinética , Modelos Moleculares , ARN/genética , Pliegue del ARN , Imagen Individual de Molécula/métodos , Termodinámica
7.
Proc IEEE Int Symp Info Theory ; 2016: 580-584, 2016 Jul.
Artículo en Inglés | MEDLINE | ID: mdl-29130024

RESUMEN

We define and characterize the "chained" Kullback-Leibler divergence min w D(p‖w) + D(w‖q) minimized over all intermediate distributions w and the analogous k-fold chained K-L divergence min D(p‖wk-1) + … + D(w2‖w1) + D(w1‖q) minimized over the entire path (w1,…,wk-1). This quantity arises in a large deviations analysis of a Markov chain on the set of types - the Wright-Fisher model of neutral genetic drift: a population with allele distribution q produces offspring with allele distribution w, which then produce offspring with allele distribution p, and so on. The chained divergences enjoy some of the same properties as the K-L divergence (like joint convexity in the arguments) and appear in k-step versions of some of the same settings as the K-L divergence (like information projections and a conditional limit theorem). We further characterize the optimal k-step "path" of distributions appearing in the definition and apply our findings in a large deviations analysis of the Wright-Fisher process. We make a connection to information geometry via the previously studied continuum limit, where the number of steps tends to infinity, and the limiting path is a geodesic in the Fisher information metric. Finally, we offer a thermodynamic interpretation of the chained divergence (as the rate of operation of an appropriately defined Maxwell's demon) and we state some natural extensions and applications (a k-step mutual information and k-step maximum likelihood inference). We release code for computing the objects we study.

8.
BMC Bioinformatics ; 16: 3, 2015 Jan 16.
Artículo en Inglés | MEDLINE | ID: mdl-25591752

RESUMEN

BACKGROUND: Single-molecule techniques have emerged as incisive approaches for addressing a wide range of questions arising in contemporary biological research [Trends Biochem Sci 38:30-37, 2013; Nat Rev Genet 14:9-22, 2013; Curr Opin Struct Biol 2014, 28C:112-121; Annu Rev Biophys 43:19-39, 2014]. The analysis and interpretation of raw single-molecule data benefits greatly from the ongoing development of sophisticated statistical analysis tools that enable accurate inference at the low signal-to-noise ratios frequently associated with these measurements. While a number of groups have released analysis toolkits as open source software [J Phys Chem B 114:5386-5403, 2010; Biophys J 79:1915-1927, 2000; Biophys J 91:1941-1951, 2006; Biophys J 79:1928-1944, 2000; Biophys J 86:4015-4029, 2004; Biophys J 97:3196-3205, 2009; PLoS One 7:e30024, 2012; BMC Bioinformatics 288 11(8):S2, 2010; Biophys J 106:1327-1337, 2014; Proc Int Conf Mach Learn 28:361-369, 2013], it remains difficult to compare analysis for experiments performed in different labs due to a lack of standardization. RESULTS: Here we propose a standardized single-molecule dataset (SMD) file format. SMD is designed to accommodate a wide variety of computer programming languages, single-molecule techniques, and analysis strategies. To facilitate adoption of this format we have made two existing data analysis packages that are used for single-molecule analysis compatible with this format. CONCLUSION: Adoption of a common, standard data file format for sharing raw single-molecule data and analysis outcomes is a critical step for the emerging and powerful single-molecule field, which will benefit both sophisticated users and non-specialists by allowing standardized, transparent, and reproducible analysis practices.


Asunto(s)
Fenómenos Fisiológicos Celulares , Biología Computacional/métodos , Programas Informáticos , Conjuntos de Datos como Asunto , Humanos , Cinética , Microscopía
9.
J Am Chem Soc ; 136(18): 6643-8, 2014 May 07.
Artículo en Inglés | MEDLINE | ID: mdl-24738560

RESUMEN

We determined the effects of mutating the long-range tertiary contacts of the Tetrahymena group I ribozyme on the dynamics of its substrate helix (referred to as P1) and on catalytic activity. Dynamics were assayed by fluorescence anisotropy of the fluorescent base analogue, 6-methyl isoxanthopterin, incorporated into the P1 helix, and fluorescence anisotropy and catalytic activity were measured for wild type and mutant ribozymes over a range of conditions. Remarkably, catalytic activity correlated with P1 anisotropy over 5 orders of magnitude of activity, with a correlation coefficient of 0.94. The functional and dynamic effects from simultaneous mutation of the two long-range contacts that weaken P1 docking are cumulative and, based on this RNA's topology, suggest distinct underlying origins for the mutant effects. Tests of mechanistic predictions via single molecule FRET measurements of rate constants for P1 docking and undocking suggest that ablation of the P14 tertiary interaction frees P2 and thereby enhances the conformational space explored by the undocked attached P1 helix. In contrast, mutation of the metal core tertiary interaction disrupts the conserved core into which the P1 helix docks. Thus, despite following a single correlation, the two long-range tertiary contacts facilitate P1 helix docking by distinct mechanisms. These results also demonstrate that a fluorescence anisotropy probe incorporated into a specific helix within a larger RNA can report on changes in local helical motions as well as differences in more global dynamics. This ability will help uncover the physical properties and behaviors that underlie the function of RNAs and RNA/protein complexes.


Asunto(s)
ARN Catalítico/química , Tetrahymena/química , Secuencia de Bases , Cartilla de ADN , Transferencia Resonante de Energía de Fluorescencia
10.
Bioinformatics ; 29(17): 2199-202, 2013 Sep 01.
Artículo en Inglés | MEDLINE | ID: mdl-23793748

RESUMEN

UNLABELLED: The number of human genomes that have been sequenced completely for different individuals has increased rapidly in recent years. Storing and transferring complete genomes between computers for the purpose of applying various applications and analysis tools will soon become a major hurdle, hindering the analysis phase. Therefore, there is a growing need to compress these data efficiently. Here, we describe a technique to compress human genomes based on entropy coding, using a reference genome and known Single Nucleotide Polymorphisms (SNPs). Furthermore, we explore several intrinsic features of genomes and information in other genomic databases to further improve the compression attained. Using these methods, we compress James Watson's genome to 2.5 megabytes (MB), improving on recent work by 37%. Similar compression is obtained for most genomes available from the 1000 Genomes Project. Our biologically inspired techniques promise even greater gains for genomes of lower organisms and for human genomes as more genomic data become available. AVAILABILITY: Code is available at sourceforge.net/projects/genomezip/


Asunto(s)
Compresión de Datos/métodos , Genoma Humano , Genómica/métodos , Algoritmos , Haplotipos , Humanos , Polimorfismo de Nucleótido Simple
11.
Philos Trans A Math Phys Eng Sci ; 370(1979): 5270-90, 2012 Nov 28.
Artículo en Inglés | MEDLINE | ID: mdl-23091208

RESUMEN

Following the simple observation that the interconnection of a set of quantum optical input-output devices can be specified using structural mode VHSIC hardware description language, we demonstrate a computer-aided schematic capture workflow for modelling and simulating multi-component photonic circuits. We describe an algorithm for parsing circuit descriptions to derive quantum equations of motion, illustrate our approach using simple examples based on linear and cavity-nonlinear optical components, and demonstrate a computational approach to hierarchical model reduction.

12.
PLoS One ; 7(2): e30024, 2012.
Artículo en Inglés | MEDLINE | ID: mdl-22363412

RESUMEN

Single molecule studies have expanded rapidly over the past decade and have the ability to provide an unprecedented level of understanding of biological systems. A common challenge upon introduction of novel, data-rich approaches is the management, processing, and analysis of the complex data sets that are generated. We provide a standardized approach for analyzing these data in the freely available software package SMART: Single Molecule Analysis Research Tool. SMART provides a format for organizing and easily accessing single molecule data, a general hidden Markov modeling algorithm for fitting an array of possible models specified by the user, a standardized data structure and graphical user interfaces to streamline the analysis and visualization of data. This approach guides experimental design, facilitating acquisition of the maximal information from single molecule experiments. SMART also provides a standardized format to allow dissemination of single molecule data and transparency in the analysis of reported data.


Asunto(s)
Sistemas de Administración de Bases de Datos , Investigación , Programas Informáticos , Algoritmos , Análisis por Conglomerados , Simulación por Computador , Transferencia Resonante de Energía de Fluorescencia , Cadenas de Markov , Fotoblanqueo
13.
Opt Express ; 19(7): 6478-86, 2011 Mar 28.
Artículo en Inglés | MEDLINE | ID: mdl-21451676

RESUMEN

We use a single 133Cs atom strongly coupled to an optical resonator to induce random binary phase modulation of a near infra-red, ∼ 500 pW laser beam, with each modulation edge caused by the dissipation of a single photon (≈ 0.23 aJ) by the atom. While our ability to deterministically induce phase edges with an additional optical control beam is limited thus far, theoretical analysis of an analogous, solid-state system indicates that efficient external control should be achievable in demonstrated nanophotonic systems.


Asunto(s)
Diseño Asistido por Computadora , Rayos Láser , Modelos Teóricos , Procesamiento de Señales Asistido por Computador/instrumentación , Simulación por Computador , Diseño de Equipo , Análisis de Falla de Equipo , Luz , Dispersión de Radiación
14.
Phys Rev Lett ; 105(4): 040502, 2010 Jul 23.
Artículo en Inglés | MEDLINE | ID: mdl-20867826

RESUMEN

We propose an approach to quantum error correction based on coding and continuous syndrome readout via scattering of coherent probe fields, in which the usual steps of measurement and discrete restoration are replaced by direct physical processing of the probe beams and coherent feedback to the register qubits. Our approach is well matched to physical implementations that feature solid-state qubits embedded in planar electromagnetic circuits, providing an autonomous and "on-chip" quantum memory design requiring no external clocking or control logic.

SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA
...