Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 20 de 36
Filter
Add more filters

Country/Region as subject
Publication year range
1.
Cell ; 179(3): 736-749.e15, 2019 10 17.
Article in English | MEDLINE | ID: mdl-31626772

ABSTRACT

Underrepresentation of Asian genomes has hindered population and medical genetics research on Asians, leading to population disparities in precision medicine. By whole-genome sequencing of 4,810 Singapore Chinese, Malays, and Indians, we found 98.3 million SNPs and small insertions or deletions, over half of which are novel. Population structure analysis demonstrated great representation of Asian genetic diversity by three ethnicities in Singapore and revealed a Malay-related novel ancestry component. Furthermore, demographic inference suggested that Malays split from Chinese ∼24,800 years ago and experienced significant admixture with East Asians ∼1,700 years ago, coinciding with the Austronesian expansion. Additionally, we identified 20 candidate loci for natural selection, 14 of which harbored robust associations with complex traits and diseases. Finally, we show that our data can substantially improve genotype imputation in diverse Asian and Oceanian populations. These results highlight the value of our data as a resource to empower human genetics discovery across broad geographic regions.


Subject(s)
Genetics, Population , Genome, Human/genetics , Selection, Genetic , Whole Genome Sequencing , Asian People/genetics , Female , Genotype , Humans , Malaysia/epidemiology , Male , Polymorphism, Single Nucleotide/genetics , Singapore/epidemiology
2.
Nat Methods ; 18(10): 1161-1168, 2021 10.
Article in English | MEDLINE | ID: mdl-34556866

ABSTRACT

The rapid growth of high-throughput technologies has transformed biomedical research. With the increasing amount and complexity of data, scalability and reproducibility have become essential not just for experiments, but also for computational analysis. However, transforming data into information involves running a large number of tools, optimizing parameters, and integrating dynamically changing reference data. Workflow managers were developed in response to such challenges. They simplify pipeline development, optimize resource usage, handle software installation and versions, and run on different compute platforms, enabling workflow portability and sharing. In this Perspective, we highlight key features of workflow managers, compare commonly used approaches for bioinformatics workflows, and provide a guide for computational and noncomputational users. We outline community-curated pipeline initiatives that enable novice and experienced users to perform complex, best-practice analyses without having to manually assemble workflows. In sum, we illustrate how workflow managers contribute to making computational analysis in biomedical research shareable, scalable, and reproducible.


Subject(s)
Biomedical Research/methods , Biomedical Research/standards , Computational Biology/methods , Workflow , Reproducibility of Results
3.
Mol Cell ; 62(4): 603-17, 2016 05 19.
Article in English | MEDLINE | ID: mdl-27184079

ABSTRACT

Identifying pairwise RNA-RNA interactions is key to understanding how RNAs fold and interact with other RNAs inside the cell. We present a high-throughput approach, sequencing of psoralen crosslinked, ligated, and selected hybrids (SPLASH), that maps pairwise RNA interactions in vivo with high sensitivity and specificity, genome-wide. Applying SPLASH to human and yeast transcriptomes revealed the diversity and dynamics of thousands of long-range intra- and intermolecular RNA-RNA interactions. Our analysis highlighted key structural features of RNA classes, including the modular organization of mRNAs, its impact on translation and decay, and the enrichment of long-range interactions in noncoding RNAs. Additionally, intermolecular mRNA interactions were organized into network clusters and were remodeled during cellular differentiation. We also identified hundreds of known and new snoRNA-rRNA binding sites, expanding our knowledge of rRNA biogenesis. These results highlight the underexplored complexity of RNA interactomes and pave the way to better understanding how RNA organization impacts biology.


Subject(s)
High-Throughput Nucleotide Sequencing/methods , RNA, Fungal/genetics , RNA, Messenger/genetics , RNA, Neoplasm/genetics , RNA, Ribosomal/genetics , RNA, Small Nucleolar/genetics , Saccharomyces cerevisiae/genetics , Transcriptome , Binding Sites , Cell Differentiation , Computational Biology , Cross-Linking Reagents/chemistry , Databases, Genetic , Embryonic Stem Cells/metabolism , Ficusin/chemistry , Gene Expression Regulation, Fungal , Gene Expression Regulation, Neoplastic , Genome-Wide Association Study , HeLa Cells , Humans , Nucleic Acid Conformation , RNA Stability , RNA, Fungal/chemistry , RNA, Fungal/metabolism , RNA, Messenger/chemistry , RNA, Messenger/metabolism , RNA, Neoplasm/chemistry , RNA, Neoplasm/metabolism , RNA, Ribosomal/chemistry , RNA, Ribosomal/metabolism , RNA, Small Nucleolar/chemistry , RNA, Small Nucleolar/metabolism , Ribosomes/genetics , Ribosomes/metabolism , Saccharomyces cerevisiae/metabolism
4.
BMC Genomics ; 18(1): 829, 2017 Oct 27.
Article in English | MEDLINE | ID: mdl-29078745

ABSTRACT

BACKGROUND: Viral populations are complex, dynamic, and fast evolving. The evolution of groups of closely related viruses in a competitive environment is termed quasispecies. To fully understand the role that quasispecies play in viral evolution, characterizing the trajectories of viral genotypes in an evolving population is the key. In particular, long-range haplotype information for thousands of individual viruses is critical; yet generating this information is non-trivial. Popular deep sequencing methods generate relatively short reads that do not preserve linkage information, while third generation sequencing methods have higher error rates that make detection of low frequency mutations a bioinformatics challenge. Here we applied BAsE-Seq, an Illumina-based single-virion sequencing technology, to eight samples from four chronic hepatitis B (CHB) patients - once before antiviral treatment and once after viral rebound due to resistance. RESULTS: With single-virion sequencing, we obtained 248-8796 single-virion sequences per sample, which allowed us to find evidence for both hard and soft selective sweeps. We were able to reconstruct population demographic history that was independently verified by clinically collected data. We further verified four of the samples independently through PacBio SMRT and Illumina Pooled deep sequencing. CONCLUSIONS: Overall, we showed that single-virion sequencing yields insight into viral evolution and population dynamics in an efficient and high throughput manner. We believe that single-virion sequencing is widely applicable to the study of viral evolution in the context of drug resistance and host adaptation, allows differentiation between soft or hard selective sweeps, and may be useful in the reconstruction of intra-host viral population demographic history.


Subject(s)
Evolution, Molecular , Genome, Viral , Hepatitis B virus/drug effects , Hepatitis B virus/genetics , Hepatitis B/virology , Lamivudine/pharmacology , Virion/genetics , Alleles , Amino Acid Substitution , Computational Biology/methods , DNA Barcoding, Taxonomic , Drug Resistance, Viral/drug effects , Gene Frequency , Hepatitis B/drug therapy , Hepatitis B virus/isolation & purification , Humans , Lamivudine/therapeutic use , Mutation
5.
Proc Natl Acad Sci U S A ; 111(33): 12103-8, 2014 Aug 19.
Article in English | MEDLINE | ID: mdl-25028492

ABSTRACT

Fastidious anaerobic bacteria play critical roles in environmental bioremediation of halogenated compounds. However, their characterization and application have been largely impeded by difficulties in growing them in pure culture. Thus far, no pure culture has been reported to respire on the notorious polychlorinated biphenyls (PCBs), and functional genes responsible for PCB detoxification remain unknown due to the extremely slow growth of PCB-respiring bacteria. Here we report the successful isolation and characterization of three Dehalococcoides mccartyi strains that respire on commercial PCBs. Using high-throughput metagenomic analysis, combined with traditional culture techniques, tetrachloroethene (PCE) was identified as a feasible alternative to PCBs to isolate PCB-respiring Dehalococcoides from PCB-enriched cultures. With PCE as an alternative electron acceptor, the PCB-respiring Dehalococcoides were boosted to a higher cell density (1.2 × 10(8) to 1.3 × 10(8) cells per mL on PCE vs. 5.9 × 10(6) to 10.4 × 10(6) cells per mL on PCBs) with a shorter culturing time (30 d on PCE vs. 150 d on PCBs). The transcriptomic profiles illustrated that the distinct PCB dechlorination profile of each strain was predominantly mediated by a single, novel reductive dehalogenase (RDase) catalyzing chlorine removal from both PCBs and PCE. The transcription levels of PCB-RDase genes are 5-60 times higher than the genome-wide average. The cultivation of PCB-respiring Dehalococcoides in pure culture and the identification of PCB-RDase genes deepen our understanding of organohalide respiration of PCBs and shed light on in situ PCB bioremediation.


Subject(s)
Chloroflexi/genetics , Genome, Bacterial , Polychlorinated Biphenyls/metabolism , Chloroflexi/metabolism , Molecular Sequence Data , Polymerase Chain Reaction
6.
J Gen Virol ; 96(12): 3470-3483, 2015 Dec.
Article in English | MEDLINE | ID: mdl-26407694

ABSTRACT

Human respiratory syncytial virus (RSV) is the major cause of lower respiratory tract infections in children ,2 years of age. Little is known about RSV intra-host genetic diversity over the course of infection or about the immune pressures that drive RSV molecular evolution. We performed whole-genome deep-sequencing on 53 RSV-positive samples (37 RSV subgroup A and 16 RSV subgroup B) collected from the upper airways of hospitalized children in southern Vietnam over two consecutive seasons. RSV A NA1 and RSV B BA9 were the predominant genotypes found in our samples, consistent with other reports on global RSV circulation during the same period. For both RSV A and B, the M gene was the most conserved, confirming its potential as a target for novel therapeutics. The G gene was the most variable and was the only gene under detectable positive selection. Further, positively selected sites inG were found in close proximity to and in some cases overlapped with predicted glycosylation motifs, suggesting that selection on amino acid glycosylation may drive viral genetic diversity. We further identified hotspots and coldspots of intra-host genetic diversity in the RSV genome, some of which may highlight previously unknown regions of functional importance.


Subject(s)
Evolution, Molecular , Genome, Viral/genetics , Respiratory Syncytial Virus Infections/veterinary , Respiratory Syncytial Virus, Human/classification , Respiratory Syncytial Virus, Human/genetics , Amino Acid Sequence , Child , Gene Expression Regulation, Viral/physiology , Genetic Variation , Genotype , Humans , Models, Molecular , Phylogeny , Protein Conformation , Respiratory Syncytial Virus Infections/epidemiology , Vietnam/epidemiology , Viral Proteins/genetics , Viral Proteins/metabolism
7.
Bioinformatics ; 29(8): 989-95, 2013 Apr 15.
Article in English | MEDLINE | ID: mdl-23428640

ABSTRACT

MOTIVATION: Recent developments in sequence alignment software have made possible multiple sequence alignments (MSAs) of >100 000 sequences in reasonable times. At present, there are no systematic analyses concerning the scalability of the alignment quality as the number of aligned sequences is increased. RESULTS: We benchmarked a wide range of widely used MSA packages using a selection of protein families with some known structures and found that the accuracy of such alignments decreases markedly as the number of sequences grows. This is more or less true of all packages and protein families. The phenomenon is mostly due to the accumulation of alignment errors, rather than problems in guide-tree construction. This is partly alleviated by using iterative refinement or selectively adding sequences. The average accuracy of progressive methods by comparison with structure-based benchmarks can be improved by incorporating information derived from high-quality structural alignments of sequences with solved structures. This suggests that the availability of high quality curated alignments will have to complement algorithmic and/or software developments in the long-term. AVAILABILITY AND IMPLEMENTATION: Benchmark data used in this study are available at http://www.clustal.org/omega/homfam-20110613-25.tar.gz and http://www.clustal.org/omega/bali3fam-26.tar.gz. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Subject(s)
Sequence Alignment/methods , Sequence Analysis, Protein/methods , Algorithms , Software
8.
Nucleic Acids Res ; 40(22): 11189-201, 2012 Dec.
Article in English | MEDLINE | ID: mdl-23066108

ABSTRACT

The study of cell-population heterogeneity in a range of biological systems, from viruses to bacterial isolates to tumor samples, has been transformed by recent advances in sequencing throughput. While the high-coverage afforded can be used, in principle, to identify very rare variants in a population, existing ad hoc approaches frequently fail to distinguish true variants from sequencing errors. We report a method (LoFreq) that models sequencing run-specific error rates to accurately call variants occurring in <0.05% of a population. Using simulated and real datasets (viral, bacterial and human), we show that LoFreq has near-perfect specificity, with significantly improved sensitivity compared with existing methods and can efficiently analyze deep Illumina sequencing datasets without resorting to approximations or heuristics. We also present experimental validation for LoFreq on two different platforms (Fluidigm and Sequenom) and its application to call rare somatic variants from exome sequencing datasets for gastric cancer. Source code and executables for LoFreq are freely available at http://sourceforge.net/projects/lofreq/.


Subject(s)
Genetic Variation , High-Throughput Nucleotide Sequencing/methods , Computer Simulation , Dengue Virus/genetics , Escherichia coli/genetics , Genomics/methods , High-Throughput Nucleotide Sequencing/standards , Humans , Mutation , Sensitivity and Specificity , Stomach Neoplasms/genetics , Viral Proteins/chemistry , Viral Proteins/genetics
9.
J Infect Dis ; 207(9): 1442-50, 2013 May 01.
Article in English | MEDLINE | ID: mdl-22807519

ABSTRACT

BACKGROUND: Dengue is the most common arboviral infection of humans. There are currently no specific treatments for dengue. Balapiravir is a prodrug of a nucleoside analogue (called R1479) and an inhibitor of hepatitis C virus replication in vivo. METHODS: We conducted in vitro experiments to determine the potency of balapiravir against dengue viruses and then an exploratory, dose-escalating, randomized placebo-controlled trial in adult male patients with dengue with <48 hours of fever. RESULTS: The clinical and laboratory adverse event profile in patients receiving balapiravir at doses of 1500 mg (n = 10) or 3000 mg (n = 22) orally for 5 days was similar to that of patients receiving placebo (n = 32), indicating balapiravir was well tolerated. However, twice daily assessment of viremia and daily assessment of NS1 antigenemia indicated balapiravir did not measurably alter the kinetics of these virological markers, nor did it reduce the fever clearance time. The kinetics of plasma cytokine concentrations and the whole blood transcriptional profile were also not attenuated by balapiravir treatment. CONCLUSIONS: Although this trial, the first of its kind in dengue, does not support balapiravir as a candidate drug, it does establish a framework for antiviral treatment trials in dengue and provides the field with a clinically evaluated benchmark molecule. CLINICAL TRIALS REGISTRATION: NCT01096576.


Subject(s)
Antiviral Agents/administration & dosage , Dengue/drug therapy , Nucleosides/administration & dosage , Administration, Oral , Adult , Antigens, Viral/blood , Antiviral Agents/adverse effects , Dengue/pathology , Dengue/virology , Dengue Virus/isolation & purification , Double-Blind Method , Fever/drug therapy , Humans , Male , Nucleosides/adverse effects , Placebos/administration & dosage , Treatment Outcome , Viral Load , Viremia/drug therapy , Young Adult
10.
Nucleic Acids Res ; 39(16): 6886-95, 2011 Sep 01.
Article in English | MEDLINE | ID: mdl-21624887

ABSTRACT

We present and validate BlastR, a method for efficiently and accurately searching non-coding RNAs. Our approach relies on the comparison of di-nucleotides using BlosumR, a new log-odd substitution matrix. In order to use BlosumR for comparison, we recoded RNA sequences into protein-like sequences. We then showed that BlosumR can be used along with the BlastP algorithm in order to search non-coding RNA sequences. Using Rfam as a gold standard, we benchmarked this approach and show BlastR to be more sensitive than BlastN. We also show that BlastR is both faster and more sensitive than BlastP used with a single nucleotide log-odd substitution matrix. BlastR, when used in combination with WU-BlastP, is about 5% more accurate than WU-BlastN and about 50 times slower. The approach shown here is equally effective when combined with the NCBI-Blast package. The software is an open source freeware available from www.tcoffee.org/blastr.html.


Subject(s)
Databases, Nucleic Acid , RNA, Untranslated/chemistry , Sequence Analysis, RNA , Algorithms , Sequence Alignment , Software
11.
Cell Rep ; 42(10): 113250, 2023 10 31.
Article in English | MEDLINE | ID: mdl-37837618

ABSTRACT

Following viral infection, the human immune system generates CD8+ T cell responses to virus antigens that differ in specificity, abundance, and phenotype. A characterization of virus-specific T cell responses allows one to assess infection history and to understand its contribution to protective immunity. Here, we perform in-depth profiling of CD8+ T cells binding to CMV-, EBV-, influenza-, and SARS-CoV-2-derived antigens in peripheral blood samples from 114 healthy donors and 55 cancer patients using high-dimensional mass cytometry and single-cell RNA sequencing. We analyze over 500 antigen-specific T cell responses across six different HLA alleles and observed unique phenotypes of T cells specific for antigens from different virus categories. Using machine learning, we extract phenotypic signatures of antigen-specific T cells, predict virus specificity for bulk CD8+ T cells, and validate these predictions, suggesting that machine learning can be used to accurately predict antigen specificity from T cell phenotypes.


Subject(s)
CD8-Positive T-Lymphocytes , Herpesvirus 4, Human , Humans , T-Cell Antigen Receptor Specificity , Antigens, Viral , Phenotype
12.
Mol Syst Biol ; 7: 539, 2011 Oct 11.
Article in English | MEDLINE | ID: mdl-21988835

ABSTRACT

Multiple sequence alignments are fundamental to many sequence analysis methods. Most alignments are computed using the progressive alignment heuristic. These methods are starting to become a bottleneck in some analysis pipelines when faced with data sets of the size of many thousands of sequences. Some methods allow computation of larger data sets while sacrificing quality, and others produce high-quality alignments, but scale badly with the number of sequences. In this paper, we describe a new program called Clustal Omega, which can align virtually any number of protein sequences quickly and that delivers accurate alignments. The accuracy of the package on smaller test cases is similar to that of the high-quality aligners. On larger data sets, Clustal Omega outperforms other packages in terms of execution time and quality. Clustal Omega also has powerful features for adding sequences to and exploiting information in existing alignments, making use of the vast amount of precomputed information in public databases like Pfam.


Subject(s)
Data Mining/methods , Proteins/analysis , Sequence Alignment/methods , Sequence Analysis, Protein/methods , Systems Biology , Algorithms , Amino Acid Sequence , Base Sequence , Databases, Factual , Molecular Sequence Data , Proteins/chemistry , Software , Systems Biology/instrumentation , Systems Biology/methods
13.
Nucleic Acids Res ; 37(22): 7360-7, 2009 Dec.
Article in English | MEDLINE | ID: mdl-19820114

ABSTRACT

The accurate computational prediction of transcription start sites (TSS) in vertebrate genomes is a difficult problem. The physicochemical properties of DNA can be computed in various ways and a many combinations of DNA features have been tested in the past for use as predictors of transcription. We looked in detail at melting temperature, which measures the temperature, at which two strands of DNA separate, considering the cooperative nature of this process. We find that peaks in melting temperature correspond closely to experimentally determined transcription start sites in human and mouse chromosomes. Using melting temperature alone, and with simple thresholding, we can predict TSS with accuracy that is competitive with the most accurate state-of-the-art TSS prediction methods. Accuracy is measured using both experimentally and manually determined TSS. The method works especially well with CpG island containing promoters, but also works when CpG islands are absent. This result is clear evidence of the important role of the physical properties of DNA in the process of transcription. It also points to the importance for TSS prediction methods to include melting temperature as prior information.


Subject(s)
Algorithms , DNA/chemistry , Temperature , Transcription Initiation Site , Animals , CpG Islands , Humans , Mice , Nucleic Acid Denaturation , Promoter Regions, Genetic
14.
F1000Res ; 10: 33, 2021.
Article in English | MEDLINE | ID: mdl-34035898

ABSTRACT

Data analysis often entails a multitude of heterogeneous steps, from the application of various command line tools to the usage of scripting languages like R or Python for the generation of plots and tables. It is widely recognized that data analyses should ideally be conducted in a reproducible way. Reproducibility enables technical validation and regeneration of results on the original or even new data. However, reproducibility alone is by no means sufficient to deliver an analysis that is of lasting impact (i.e., sustainable) for the field, or even just one research group. We postulate that it is equally important to ensure adaptability and transparency. The former describes the ability to modify the analysis to answer extended or slightly different research questions. The latter describes the ability to understand the analysis in order to judge whether it is not only technically, but methodologically valid. Here, we analyze the properties needed for a data analysis to become reproducible, adaptable, and transparent. We show how the popular workflow management system Snakemake can be used to guarantee this, and how it enables an ergonomic, combined, unified representation of all steps involved in data analysis, ranging from raw data processing, to quality control and fine-grained, interactive exploration and plotting of final results.


Subject(s)
Data Analysis , Software , Reproducibility of Results , Workflow
15.
Nucleic Acids Res ; 36(9): e52, 2008 May.
Article in English | MEDLINE | ID: mdl-18420654

ABSTRACT

R-Coffee is a multiple RNA alignment package, derived from T-Coffee, designed to align RNA sequences while exploiting secondary structure information. R-Coffee uses an alignment-scoring scheme that incorporates secondary structure information within the alignment. It works particularly well as an alignment improver and can be combined with any existing sequence alignment method. In this work, we used R-Coffee to compute multiple sequence alignments combining the pairwise output of sequence aligners and structural aligners. We show that R-Coffee can improve the accuracy of all the sequence aligners. We also show that the consistency-based component of T-Coffee can improve the accuracy of several structural aligners. R-Coffee was tested on 388 BRAliBase reference datasets and on 11 longer Cmfinder datasets. Altogether our results suggest that the best protocol for aligning short sequences (less than 200 nt) is the combination of R-Coffee with the RNA pairwise structural aligner Consan. We also show that the simultaneous combination of the four best sequence alignment programs with R-Coffee produces alignments almost as accurate as those obtained with R-Coffee/Consan. Finally, we show that R-Coffee can also be used to align longer datasets beyond the usual scope of structural aligners. R-Coffee is freely available for download, along with documentation, from the T-Coffee web site (www.tcoffee.org).


Subject(s)
RNA, Untranslated/chemistry , Sequence Alignment/methods , Sequence Analysis, RNA , Software , Algorithms , Nucleic Acid Conformation
16.
Nucleic Acids Res ; 36(Web Server issue): W10-3, 2008 Jul 01.
Article in English | MEDLINE | ID: mdl-18483080

ABSTRACT

The R-Coffee web server produces highly accurate multiple alignments of noncoding RNA (ncRNA) sequences, taking into account predicted secondary structures. R-Coffee uses a novel algorithm recently incorporated in the T-Coffee package. R-Coffee works along the same lines as T-Coffee: it uses pairwise or multiple sequence alignment (MSA) methods to compute a primary library of input alignments. The program then computes an MSA highly consistent with both the alignments contained in the library and the secondary structures associated with the sequences. The secondary structures are predicted using RNAplfold. The server provides two modes. The slow/accurate mode is restricted to small datasets (less than 5 sequences less than 150 nucleotides) and combines R-Coffee with Consan, a very accurate pairwise RNA alignment method. For larger datasets a fast method can be used (RM-Coffee mode), that uses R-Coffee to combine the output of the three packages which combines the outputs from programs found to perform best on RNA (MUSCLE, MAFFT and ProbConsRNA). Our BRAliBase benchmarks indicate that the R-Coffee/Consan combination is one of the best ncRNA alignment methods for short sequences, while the RM-Coffee gives comparable results on longer sequences. The R-Coffee web server is available at http://www.tcoffee.org.


Subject(s)
RNA, Untranslated/chemistry , Sequence Alignment/methods , Sequence Analysis, RNA , Software , Algorithms , Internet , Nucleic Acid Conformation
17.
Nat Genet ; 52(2): 177-186, 2020 02.
Article in English | MEDLINE | ID: mdl-32015526

ABSTRACT

Lung cancer is the world's leading cause of cancer death and shows strong ancestry disparities. By sequencing and assembling a large genomic and transcriptomic dataset of lung adenocarcinoma (LUAD) in individuals of East Asian ancestry (EAS; n = 305), we found that East Asian LUADs had more stable genomes characterized by fewer mutations and fewer copy number alterations than LUADs from individuals of European ancestry. This difference is much stronger in smokers as compared to nonsmokers. Transcriptomic clustering identified a new EAS-specific LUAD subgroup with a less complex genomic profile and upregulated immune-related genes, allowing the possibility of immunotherapy-based approaches. Integrative analysis across clinical and molecular features showed the importance of molecular phenotypes in patient prognostic stratification. EAS LUADs had better prediction accuracy than those of European ancestry, potentially due to their less complex genomic architecture. This study elucidated a comprehensive genomic landscape of EAS LUADs and highlighted important ancestry differences between the two cohorts.


Subject(s)
Adenocarcinoma of Lung/genetics , Lung Neoplasms/genetics , Mutation , Adenocarcinoma of Lung/etiology , Adenocarcinoma of Lung/mortality , Adenocarcinoma of Lung/therapy , Aged , Asian People/genetics , Cohort Studies , DNA Copy Number Variations , ErbB Receptors/genetics , Exome , Female , Gene Expression Profiling , Humans , Lung Neoplasms/etiology , Lung Neoplasms/mortality , Lung Neoplasms/therapy , Male , Middle Aged , Proto-Oncogene Proteins p21(ras)/genetics , Singapore , Tumor Suppressor Protein p53/genetics
18.
Nat Commun ; 10(1): 1408, 2019 03 29.
Article in English | MEDLINE | ID: mdl-30926818

ABSTRACT

Dengue (DENV) and Zika (ZIKV) viruses are clinically important members of the Flaviviridae family with an 11 kb positive strand RNA genome that folds to enable virus function. Here, we perform structure and interaction mapping on four DENV and ZIKV strains inside virions and in infected cells. Comparative analysis of SHAPE reactivities across serotypes nominates potentially functional regions that are highly structured, conserved, and contain low synonymous mutation rates. Interaction mapping by SPLASH identifies many pair-wise interactions, 40% of which form alternative structures, suggesting extensive structural heterogeneity. Analysis of shared interactions between serotypes reveals a conserved macro-organization whereby interactions can be preserved at physical locations beyond sequence identities. We further observe that longer-range interactions are preferentially disrupted inside cells, and show the importance of new interactions in virus fitness. These findings deepen our understanding of Flavivirus genome organization and serve as a resource for designing therapeutics in targeting RNA viruses.


Subject(s)
Chromosome Mapping , Dengue Virus/chemistry , Dengue Virus/genetics , Zika Virus/chemistry , Zika Virus/genetics , Animals , Base Sequence , Cell Line , Conserved Sequence , Genome, Viral , Humans , Mice , Models, Molecular , Mutation/genetics , Nicotinic Acids , RNA, Viral/chemistry , Virion/genetics
19.
BMC Bioinformatics ; 9: 219, 2008 Apr 28.
Article in English | MEDLINE | ID: mdl-18442401

ABSTRACT

BACKGROUND: Aligning homologous non-coding RNAs (ncRNAs) correctly in terms of sequence and structure is an unresolved problem, due to both mathematical complexity and imperfect scoring functions. High quality alignments, however, are a prerequisite for most consensus structure prediction approaches, homology searches, and tools for phylogeny inference. Automatically created ncRNA alignments often need manual corrections, yet this manual refinement is tedious and error-prone. RESULTS: We present an extended version of CONSTRUCT, a semi-automatic, graphical tool suitable for creating RNA alignments correct in terms of both consensus sequence and consensus structure. To this purpose CONSTRUCT combines sequence alignment, thermodynamic data and various measures of covariation. One important feature is that the user is guided during the alignment correction step by a consensus dotplot, which displays all thermodynamically optimal base pairs and the corresponding covariation. Once the initial alignment is corrected, optimal and suboptimal secondary structures as well as tertiary interaction can be predicted. We demonstrate CONSTRUCT's ability to guide the user in correcting an initial alignment, and show an example for optimal secondary consensus structure prediction on very hard to align SECIS elements. Moreover we use CONSTRUCT to predict tertiary interactions from sequences of the internal ribosome entry site of CrP-like viruses. In addition we show that alignments specifically designed for benchmarking can be easily be optimized using CONSTRUCT, although they share very little sequence identity. CONCLUSION: CONSTRUCT's graphical interface allows for an easy alignment correction based on and guided by predicted and known structural constraints. It combines several algorithms for prediction of secondary consensus structure and even tertiary interactions. The CONSTRUCT package can be downloaded from the URL listed in the Availability and requirements section of this article.


Subject(s)
Algorithms , Consensus Sequence/genetics , RNA/genetics , Sequence Alignment/methods , Sequence Analysis, RNA/methods , Software , Base Sequence , Molecular Sequence Data
20.
PLoS One ; 13(8): e0201768, 2018.
Article in English | MEDLINE | ID: mdl-30089174

ABSTRACT

BACKGROUND: While the aetiology of age-related macular degeneration (AMD)-a major blinding disease-remains unknown, the disease is strongly associated with variants in the complement factor H (CFH) gene. CFH variants also confer susceptibility to invasive infection with several bacterial colonizers of the nasopharyngeal mucosa. This shared susceptibility locus implicates complement deregulation as a common disease mechanism, and suggests the possibility that microbial interactions with host complement may trigger AMD. In this study, we address this possibility by testing the hypothesis that AMD is associated with specific microbial colonization of the human nasopharynx. RESULTS: High-throughput Illumina sequencing of the V3-V6 region of the microbial 16S ribosomal RNA gene was used to comprehensively and accurately describe the human pharyngeal microbiome, at genus level, in 245 AMD patients and 386 controls. Based on mean and differential microbial abundance analyses, we determined an overview of the pharyngeal microbiota, as well as candidate genera (Prevotella and Gemella) suggesting an association towards AMD health and disease conditions. CONCLUSIONS: Utilizing an extensive study population from Singapore, our results provided an accurate description of the pharyngeal microbiota profiles in AMD health and disease conditions. Through identification of candidate genera that are different between conditions, we provide preliminary evidence for the existence of microbial triggers for AMD. Ethical approval for this study was obtained through the Singapore Health Clinical Institutional Review Board, reference numbers R799/63/2010 and 2010/585/A.


Subject(s)
Macular Degeneration/microbiology , Microbiota , Pharynx/microbiology , Adult , Aged , Aged, 80 and over , Case-Control Studies , Cohort Studies , Female , Humans , Male , Microbiota/genetics , Middle Aged , Nasal Cavity/microbiology , RNA, Bacterial , RNA, Ribosomal, 16S , Singapore
SELECTION OF CITATIONS
SEARCH DETAIL