RESUMEN
MOTIVATION: It remains challenging to unravel new susceptibility genes of complex diseases and the mechanisms in genome-wide association studies. There are at least two difficulties, isolation of the genuine susceptibility genes from many indirectly associated genes and functional validation of these genes. RESULTS: We first proposed a novel conditional gene-based association test which can use only summary statistics to isolate independently associated genes of a disease. Applying this method, we detected 185 genes of independent association with schizophrenia. We then designed an in-silico experiment based on expression/co-expression to systematically validate pathogenic potential of these genes. We found that genes of independent association with schizophrenia formed more co-expression pairs in normal post-natal but not pre-natal human brain regions than expected. Interestingly, no co-expression enrichment was found in the brain regions of schizophrenia patients. The genes with independent association also had more significant P-values for differential expression between schizophrenia patients and controls in the brain regions. In contrast, indirectly associated genes or associated genes by other widely-used gene-based tests had no such differential expression and co-expression patterns. In summary, this conditional gene-based association test is effective for isolating directly associated genes from indirectly associated genes, and the results insightfully suggest that common variants might contribute to schizophrenia largely by distorting expression and co-expression in post-natal brains. AVAILABILITY AND IMPLEMENTATION: The conditional gene-based association test has been implemented in a platform 'KGG' in Java and is publicly available at http://grass.cgs.hku.hk/limx/kgg/. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
Asunto(s)
Estudio de Asociación del Genoma Completo , Esquizofrenia/genética , Humanos , Polimorfismo de Nucleótido SimpleRESUMEN
Summary: AC-DIAMOND (v1) is a DNA-protein alignment tool designed to tackle the efficiency challenge of aligning large amount of reads or contigs to protein databases. When compared with the previously most efficient method DIAMOND, AC-DIAMOND gains a 6- to 7-fold speed-up, while retaining a similar degree of sensitivity. The improvement is rooted at two aspects: first, using a compressed index of seeds with adaptive-length to speed-up the matching between query and reference sequences; second, adopting a compact form of dynamic programing to fully utilize the parallelism of the SIMD capability. Availability and implementation: Software source codes and binaries available at https://github.com/Maihj/AC-DIAMOND/. Supplementary information: Supplementary data are available at Bioinformatics online.
Asunto(s)
Programas Informáticos , ADN , Bases de Datos de Proteínas , Proteínas , Análisis de Secuencia de ADNRESUMEN
BACKGROUND: The application of long-read sequencing using the Oxford Nanopore Technologies (ONT) MinION sequencer is getting more diverse in the medical field. Having a high sequencing error of ONT and limited throughput from a single MinION flowcell, however, limits its applicability for accurate variant detection. Medical exome sequencing (MES) targets clinically significant exon regions, allowing rapid and comprehensive screening of pathogenic variants. By applying MES with MinION sequencing, the technology can achieve a more uniform capture of the target regions, shorter turnaround time, and lower sequencing cost per sample. METHOD: We introduced a cost-effective optimized workflow, ECNano, comprising a wet-lab protocol and bioinformatics analysis, for accurate variant detection at 4800 clinically important genes and regions using a single MinION flowcell. The ECNano wet-lab protocol was optimized to perform long-read target enrichment and ONT library preparation to stably generate high-quality MES data with adequate coverage. The subsequent variant-calling workflow, Clair-ensemble, adopted a fast RNN-based variant caller, Clair, and was optimized for target enrichment data. To evaluate its performance and practicality, ECNano was tested on both reference DNA samples and patient samples. RESULTS: ECNano achieved deep on-target depth of coverage (DoC) at average > 100× and > 98% uniformity using one MinION flowcell. For accurate ONT variant calling, the generated reads sufficiently covered 98.9% of pathogenic positions listed in ClinVar, with 98.96% having at least 30× DoC. ECNano obtained an average read length of 1000 bp. The long reads of ECNano also covered the adjacent splice sites well, with 98.5% of positions having ≥ 30× DoC. Clair-ensemble achieved > 99% recall and accuracy for SNV calling. The whole workflow from wet-lab protocol to variant detection was completed within three days. CONCLUSION: We presented ECNano, an out-of-the-box workflow comprising (1) a wet-lab protocol for ONT target enrichment sequencing and (2) a downstream variant detection workflow, Clair-ensemble. The workflow is cost-effective, with a short turnaround time for high accuracy variant calling in 4800 clinically significant genes and regions using a single MinION flowcell. The long-read exon captured data has potential for further development, promoting the application of long-read sequencing in personalized disease treatment and risk prediction.
Asunto(s)
Secuenciación de Nucleótidos de Alto Rendimiento , Nanoporos , Análisis Costo-Beneficio , Secuenciación de Nucleótidos de Alto Rendimiento/métodos , Humanos , Análisis de Secuencia de ADN/métodos , Flujo de TrabajoRESUMEN
HKG is the first fully accessible variant database for Hong Kong Cantonese, constructed from 205 novel whole-exome sequencing data. There has long been a research gap in the understanding of the genetic architecture of southern Chinese subgroups, including Hong Kong Cantonese. HKG detected 196 325 high-quality variants with 5.93% being novel, and 25 472 variants were found to be unique in HKG compared to three Chinese populations sampled from 1000 Genomes (CHN). PCA illustrates the uniqueness of HKG in CHN, and the admixture study estimated the ancestral composition of HKG and CHN, with a gradient change from north to south, consistent with their geological distribution. ClinVar, CIViC and PharmGKB annotated 599 clinically significant variants and 360 putative loss-of-function variants, substantiating our understanding of population characteristics for future medical development. Among the novel variants, 96.57% were singleton and 6.85% were of high impact. With a good representation of Hong Kong Cantonese, we demonstrated better variant imputation using reference with the addition of HKG data, thus successfully filling the data gap in southern Chinese to facilitate the regional and global development of population genetics.
RESUMEN
BACKGROUND: With the rapid development of genome sequencing techniques, traditional research methods based on the isolation and cultivation of microorganisms are being gradually replaced by metagenomics, which is also known as environmental genomics. The first step, which is still a major bottleneck, of metagenomics is the taxonomic characterization of DNA fragments (reads) resulting from sequencing a sample of mixed species. This step is usually referred as "binning". Existing binning methods are based on supervised or semi-supervised approaches which rely heavily on reference genomes of known microorganisms and phylogenetic marker genes. Due to the limited availability of reference genomes and the bias and instability of marker genes, existing binning methods may not be applicable in many cases. RESULTS: In this paper, we present an unsupervised binning method based on the distribution of a carefully selected set of l-mers (substrings of length l in DNA fragments). From our experiments, we show that our method can accurately bin DNA fragments with various lengths and relative species abundance ratios without using any reference and training datasets. Another feature of our method is its error robustness. The binning accuracy decreases by less than 1% when the sequencing error rate increases from 0% to 5%. Note that the typical sequencing error rate of existing commercial sequencing platforms is less than 2%. CONCLUSIONS: We provide a new and effective tool to solve the metagenome binning problem without using any reference datasets or markers information of any known reference genomes (species). The source code of our software tool, the reference genomes of the species for generating the test datasets and the corresponding test datasets are available at http://i.cs.hku.hk/~alse/MetaCluster/.
Asunto(s)
ADN/química , Minería de Datos/métodos , Metagenómica/métodos , Análisis de Secuencia de ADN/métodos , Algoritmos , Análisis por Conglomerados , Bases de Datos Genéticas , Microbiología Ambiental , Escherichia coli/genética , Genoma Bacteriano/genética , Lactobacillus/genéticaRESUMEN
BACKGROUND: Flavonoids in Chinese Medicine have been proven in animal studies that could aid in osteogenesis and bone formation. However, there is no consented mechanism for how these phytochemicals action on the bone-forming osteoblasts, and henceforth the prediction model of chemical screening for this specific biochemical function has not been established. The purpose of this study was to develop a novel selection and effective approach of flavonoids on the prediction of bone-forming ability via osteoblastic voltage-gated calcium (CaV) activation and inhibition using molecular modelling technique. METHOD: Quantitative structure-activity relationship (QSAR) in supervised maching-learning approach is applied in this study to predict the behavioral manifestations of flavonoids in the CaV channels, and developing statistical correlation between the biochemical features and the behavioral manifestations of 24 compounds (Training set: Kaempferol, Taxifolin, Daidzein, Morin, Scutellarein, Quercetin, Apigenin, Myricetin, Tamarixetin, Rutin, Genistein, 5,7,2'-Trihydroxyflavone, Baicalein, Luteolin, Galangin, Chrysin, Isorhamnetin, Naringin, 3-Methyl galangin, Resokaempferol; test set: 5-Hydroxyflavone, 3,6,4'-Trihydroxyflavone, 3,4'-Dihydroxyflavone and Naringenin). Based on statistical algorithm, QSAR provides a reasonable basis for establishing a predictive correlation model by a variety of molecular descriptors that are able to identify as well as analyse the biochemical features of flavonoids that engaged in activating or inhibiting the CaV channels for osteoblasts. RESULTS: The model has shown these flavonoids have high activating effects on CaV channel for osteogenesis. In addition, scutellarein was ranked the highest among the screened flavonoids, and other lower ranked compounds, such as daidzein, quercetin, genistein and naringin, have shown the same descending order as previous animal studies. CONCLUSION: This predictive modelling study has confirmed and validated the biochemical activity of the flavonoids in the osteoblastic CaV activation.
RESUMEN
OBJECTIVE: We designed and tested a Nanopore sequencing panel for direct tuberculosis drug resistance profiling. The panel targeted 10 resistance-associated loci. We assessed the feasibility of amplifying and sequencing these loci from 23 clinical specimens with low bacillary burden. RESULTS: At least 8 loci were successfully amplified from the majority for predicting first- and second-line drug resistance (14/23, 60.87%), and the 12 specimens yielding all 10 targets were sequenced with Nanopore MinION and Illumina MiSeq. MinION sequencing data was corrected by Nanopolish and recurrent variants were filtered. A total of 67,082 bases across all consensus sequences were analyzed, with 67,019 bases called by both MinION and MiSeq as wildtype. For the 41 single nucleotide variants (SNVs) called by MiSeq with 100% variant allelic frequency (VAF), 39 (95.1%) were called by MinION. For the 22 mixed bases called by MiSeq, a SNV with the highest VAF (70%) was called by MinION. With short assay time, reasonable reagent cost as well as continuously improving sequencing chemistry and signal correction pipelines, this Nanopore method can be a viable option for direct tuberculosis drug resistance profiling in the near future.
Asunto(s)
Mycobacterium tuberculosis , Nanoporos , Tuberculosis , Resistencia a Medicamentos , Secuenciación de Nucleótidos de Alto Rendimiento , Humanos , Mycobacterium tuberculosis/genética , Tuberculosis/tratamiento farmacológicoRESUMEN
UNLABELLED: Predicting motif pairs from a set of protein sequences based on the protein-protein interaction data is an important, but difficult computational problem. Tan et al. proposed a solution to this problem. However, the scoring function (using chi(2) testing) used in their approach is not adequate and their approach is also not scalable. It may take days to process a set of 5000 protein sequences with about 20,000 interactions. Later, Leung et al. proposed an improved scoring function and faster algorithms for solving the same problem. But, the model used in Leung et al. is complicated. The exact value of the scoring function is not easy to compute and an estimated value is used in practice. In this paper, we derive a better model to capture the significance of a given motif pair based on a clustering notion. We develop a fast heuristic algorithm to solve the problem. The algorithm is able to locate the correct motif pair in the yeast data set in about 45 minutes for 5000 protein sequences and 20,000 interactions. Moreover, we derive a lower bound result for the p-value of a motif pair in order for it to be distinguishable from random motif pairs. The lower bound result has been verified using simulated data sets. AVAILABILITY: http://alse.cs.hku.hk/motif_pair.
Asunto(s)
Algoritmos , Análisis por Conglomerados , Reconocimiento de Normas Patrones Automatizadas/métodos , Mapeo de Interacción de Proteínas/métodos , Proteínas/química , Proteínas/metabolismo , Análisis de Secuencia de Proteína/métodos , Secuencias de Aminoácidos , Secuencia de Aminoácidos , Sitios de Unión , Datos de Secuencia Molecular , Unión Proteica , Estructura Terciaria de ProteínaRESUMEN
Aim: To explore potential utility of metagenomic sequencing for improving etiologic diagnosis of infective endocarditis (IE) caused by fastidious bacteria. Materials & methods: Plasma and heart valves of two patients, who were diagnosed with IE caused by Bartonella quintana and Propionibacterium species, were sequenced by using Illumina MiSeq and Nanopore MinION. Results: For patient 1, B. quintana was detected in the plasma pool collected 4 days before valvular replacement surgery. For patient 2, Propionibacterium sp. oral taxon 193 was detected in the plasma sample collected on hospital day 1. Nearly complete bacterial genomes (>98%) were retrieved from resected heart valves of both patients, enabling detection of antibiotic resistance-associated features. Real-time sequencing of heart valves identified both pathogens within the first 16 min of sequencing runs. Conclusion: Metagenomic sequencing may be a helpful supplement to IE diagnostic workflow, especially when conventional tests fail to yield a diagnosis.
Asunto(s)
Bacterias/genética , ADN Bacteriano/análisis , Endocarditis Bacteriana/diagnóstico , Válvulas Cardíacas/microbiología , Metagenómica/estadística & datos numéricos , Bacterias/aislamiento & purificación , Humanos , Metagenómica/métodos , Reacción en Cadena de la PolimerasaRESUMEN
Isolation of Helicobacter cinaedi from a positive blood culture requires prolonged and stringent subculture conditions. Direct whole-genome sequencing (WGS) of a positive blood culture may provide timely treatment-associated genetic information. Here, we report a draft genome sequence of H. cinaedi compiled by direct WGS, which was 1,995,911 bp in length with 39.1% GC content.
RESUMEN
METHOLODOGY: This study examined the prevalence and correlates of mental illness in homeless people in Hong Kong and explored the barriers preventing their access to health care. Ninety-seven Cantonese-speaking Chinese who were homeless during the study period were selected at random from the records of the three organisations serving the homeless population. The response rate was 69%. Seventeen subjects could not give valid consent due to their poor mental state, so their responses were excluded from the data analysis. A psychiatrist administered the Structured Clinical Interview for DSM-IV Axis-I disorders (SCID-I) and the Mini -Mental State Examination. Consensus diagnoses for subjects who could not complete the SCID-I were established by three independent psychiatrists. FINDINGS: The point prevalence of mental illness was 56%. Seventy-one percent of the subjects had a lifetime history of mental illness, 30% had a mood disorder, 25% had an alcohol use disorder, 25% had a substance use disorder, 10% had a psychotic disorder, 10% had an anxiety disorder and 6% had dementia. Forty-one percent of the subjects with mental illness had undergone a previous psychiatric assessment. Only 13% of the subjects with mental illness were receiving psychiatric care at the time of interview. The prevalence of psychotic disorders, dementia and the rate of under treatment are hugely underestimated, as a significant proportion (18%) of the subjects initially selected were too ill to give consent to join the study. CONCLUSION: The low treatment rate and the presence of this severely ill and unreached group of homeless people reflect the fact that the current mode of service delivery is failing to support the most severely ill homeless individuals.
Asunto(s)
Personas con Mala Vivienda/psicología , Trastornos Mentales/epidemiología , Adulto , Anciano , Anciano de 80 o más Años , Femenino , Hong Kong/epidemiología , Humanos , Masculino , Trastornos Mentales/psicología , Persona de Mediana Edad , Prevalencia , Adulto JovenRESUMEN
This study examined the point prevalence of Borderline Personality Disorder (BPD) and its clinical correlates in patients with recent deliberate self-harm (DSH) in Hong Kong. A representative consecutive sample (n = 160) of patients with DSH referred to Prince of Wales Hospital from April 1, 2007 to March 31, 2008 was recruited. Their BPD status was determined according to the BPD subscale of the Chinese version of Structured Clinical Interview for DSM-IV Axis II Personality Disorders (SCID-II). The point prevalence of BPD was calculated. Subjects with and without BPD were compared in terms of demographic and clinical characteristics. Thirty out of 160 (18.8%) DSH patients were found to suffer from BPD. DSH patients with BPD were more likely to be female (p = .020), more often reported history of childhood physical (p = 0.043) and sexual abuse (p < 0.001), history of past DSH (p = 0.010), being younger at first DSH (p = 0.039), and more likely to suffer from current alcohol and substance use disorder (p = 0.043) and eating disorder (p = 0.040) than those without BPD. Being female, having history of childhood sexual abuse and current alcohol and substance use disorder were found to be independent predictors of BPD status by binary logistic regression.
Asunto(s)
Trastorno de Personalidad Limítrofe/epidemiología , Trastorno de Personalidad Limítrofe/psicología , Conducta Autodestructiva/epidemiología , Conducta Autodestructiva/psicología , Adolescente , Adulto , Adultos Sobrevivientes del Maltrato a los Niños/psicología , Trastorno de Personalidad Limítrofe/complicaciones , Trastorno de Personalidad Limítrofe/diagnóstico , Comorbilidad , Diagnóstico Dual (Psiquiatría) , Servicio de Urgencia en Hospital , Femenino , Hong Kong/epidemiología , Humanos , Entrevista Psicológica , Modelos Logísticos , Masculino , Persona de Mediana Edad , Prevalencia , Factores de Riesgo , Conducta Autodestructiva/complicaciones , Trastornos Relacionados con Sustancias/complicaciones , Trastornos Relacionados con Sustancias/epidemiología , Trastornos Relacionados con Sustancias/psicología , Adulto JovenRESUMEN
Electroconvulsive therapy (ECT) is still the fastest, most effective, and frequently life-saving therapeutic intervention in several forms of depression and some other psychiatric disorders. Transient memory disturbances are frequent after ECT. A randomized, double-blind, placebo-controlled study was conducted to investigate the effects of piracetam on ECT-induced confusion and memory disturbances. Thirty-eight consecutively admitted patients with depressive illness or schizophrenia requiring ECT were given either piracetam or an identical-looking placebo during the period of ECT treatment and for 2 weeks afterward. Daily dosage of piracetam was 7.2 g, given orally for the first 2 weeks while patients underwent ECT (loading phase), followed by 4.8 g for the rest of the study period. Participants were evaluated by standardized clinical rating scales and cognitive psychologic tests 1 to 2 days before ECT, 1 day after their third and sixth ECT treatments, and 2 weeks after they had completed their ECT courses. Piracetam had no significant effect in preventing ECT-induced memory disturbances. All clinical ratings were consistently, albeit not significantly, better in the piracetam group, suggesting that piracetam may have augmented the effects of ECT.