Pesquisa | BVS Aleitamento Materno

xGAP: a python based efficient, modular, extensible and fault tolerant genomic analysis pipeline for variant discovery.

Gorla, Aditya; Jew, Brandon; Zhang, Luke; Sul, Jae Hoon.

Bioinformatics ; 37(1): 9-16, 2021 Apr 09.

Artigo em Inglês | MEDLINE | ID: mdl-33416856

RESUMO

MOTIVATION: Since the first human genome was sequenced in 2001, there has been a rapid growth in the number of bioinformatic methods to process and analyze next-generation sequencing (NGS) data for research and clinical studies that aim to identify genetic variants influencing diseases and traits. To achieve this goal, one first needs to call genetic variants from NGS data, which requires multiple computationally intensive analysis steps. Unfortunately, there is a lack of an open-source pipeline that can perform all these steps on NGS data in a manner, which is fully automated, efficient, rapid, scalable, modular, user-friendly and fault tolerant. To address this, we introduce xGAP, an extensible Genome Analysis Pipeline, which implements modified GATK best practice to analyze DNA-seq data with the aforementioned functionalities. RESULTS: xGAP implements massive parallelization of the modified GATK best practice pipeline by splitting a genome into many smaller regions with efficient load-balancing to achieve high scalability. It can process 30× coverage whole-genome sequencing (WGS) data in â¼90 min. In terms of accuracy of discovered variants, xGAP achieves average F1 scores of 99.37% for single nucleotide variants and 99.20% for insertion/deletions across seven benchmark WGS datasets. We achieve highly consistent results across multiple on-premises (SGE & SLURM) high-performance clusters. Compared to the Churchill pipeline, with similar parallelization, xGAP is 20% faster when analyzing 50× coverage WGS on Amazon Web Service. Finally, xGAP is user-friendly and fault tolerant where it can automatically re-initiate failed processes to minimize required user intervention. AVAILABILITY AND IMPLEMENTATION: xGAP is available at https://github.com/Adigorla/xgap. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

Identification of infectious viruses for risk-based virus testing of CHO unprocessed bulk using next-generation sequencing.

Hsu, Tiffany; Talley, Mary Jo; Yang, Ping; Geiselhoeringer, Angela; Yang, Cindy; Gorla, Aditya; Rahman, M Julhasur; Silva, Lindsey; Chen, Dayue; Yang, Bin.

Biotechnol Prog ; : e3485, 2024 Jul 25.

Artigo em Inglês | MEDLINE | ID: mdl-39051853

RESUMO

It is important to increase manufacturing speed to make medicines more widely available. One bottleneck for CHO-based drug substance release is the in vitro viral (IVV) cell-based assay on unprocessed bulk. To increase process speed, we evaluate the suitability of replacing the IVV cell-based assay with next-generation sequencing (NGS). First, we outline how NGS is currently used in the pharmaceutical industry, and how it may apply to CHO virus testing. Second, we examine CHO virus contamination history. Since prior virus contaminants can replicate in the production bioreactor, we perform a literature search and classify 159 viruses as high, medium, low, or unknown risk based on their ability to infect CHO cells. Overall, the risk of virus contamination during the CHO manufacturing process is low. Only six viruses were reported to have contaminated CHO bioprocesses over the past several decades, and were primarily caused by fetal bovine serum or cell culture components. These virus contamination events can be mitigated through limitation and control of raw materials, combined with virus testing and virus clearance technologies. The list of CHO infectious viruses provides a starting framework for virus safety risk assessment and NGS development. Furthermore, ICH Q5A (R2) includes NGS as a molecular method for adventitious agent testing, paving a path forward for modernizing CHO virus testing.

Genetic association analysis of human median voice pitch identifies a common locus for tonal and non-tonal languages.

Di, Yazheng; Mefford, Joel; Rahmani, Elior; Wang, Jinhan; Ravi, Vijay; Gorla, Aditya; Alwan, Abeer; Zhu, Tingshao; Flint, Jonathan.

Commun Biol ; 7(1): 540, 2024 May 07.

Artigo em Inglês | MEDLINE | ID: mdl-38714798

RESUMO

The genetic influence on human vocal pitch in tonal and non-tonal languages remains largely unknown. In tonal languages, such as Mandarin Chinese, pitch changes differentiate word meanings, whereas in non-tonal languages, such as Icelandic, pitch is used to convey intonation. We addressed this question by searching for genetic associations with interindividual variation in median pitch in a Chinese major depression case-control cohort and compared our results with a genome-wide association study from Iceland. The same genetic variant, rs11046212-T in an intron of the ABCC9 gene, was one of the most strongly associated loci with median pitch in both samples. Our meta-analysis revealed four genome-wide significant hits, including two novel associations. The discovery of genetic variants influencing vocal pitch across both tonal and non-tonal languages suggests the possibility of a common genetic contribution to the human vocal system shared in two distinct populations with languages that differ in tonality (Icelandic and Mandarin).

Assuntos

Estudo de Associação Genômica Ampla , Idioma , Humanos , Masculino , Feminino , Polimorfismo de Nucleotídeo Único , Adulto , Islândia , Estudos de Casos e Controles , Pessoa de Meia-Idade , Voz/fisiologia , Percepção da Altura Sonora , Povo Asiático/genética

Phenotypic subtyping via contrastive learning.

Gorla, Aditya; Sankararaman, Sriram; Burchard, Esteban; Flint, Jonathan; Zaitlen, Noah; Rahmani, Elior.

bioRxiv ; 2023 Jan 06.

Artigo em Inglês | MEDLINE | ID: mdl-36711575

RESUMO

Defining and accounting for subphenotypic structure has the potential to increase statistical power and provide a deeper understanding of the heterogeneity in the molecular basis of complex disease. Existing phenotype subtyping methods primarily rely on clinically observed heterogeneity or metadata clustering. However, they generally tend to capture the dominant sources of variation in the data, which often originate from variation that is not descriptive of the mechanistic heterogeneity of the phenotype of interest; in fact, such dominant sources of variation, such as population structure or technical variation, are, in general, expected to be independent of subphenotypic structure. We instead aim to find a subspace with signal that is unique to a group of samples for which we believe that subphenotypic variation exists (e.g., cases of a disease). To that end, we introduce Phenotype Aware Components Analysis (PACA), a contrastive learning approach leveraging canonical correlation analysis to robustly capture weak sources of subphenotypic variation. In the context of disease, PACA learns a gradient of variation unique to cases in a given dataset, while leveraging control samples for accounting for variation and imbalances of biological and technical confounders between cases and controls. We evaluated PACA using an extensive simulation study, as well as on various subtyping tasks using genotypes, transcriptomics, and DNA methylation data. Our results provide multiple strong evidence that PACA allows us to robustly capture weak unknown variation of interest while being calibrated and well-powered, far superseding the performance of alternative methods. This renders PACA as a state-of-the-art tool for defining de novo subtypes that are more likely to reflect molecular heterogeneity, especially in challenging cases where the phenotypic heterogeneity may be masked by a myriad of strong unrelated effects in the data.

RESUMO

RESUMO

RESUMO

Assuntos

RESUMO

ENVIAR RESULTADO:

SELEÇÃO DE REFERÊNCIAS

DETALHE DA PESQUISA