Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 8 de 8
Filtrar
1.
BMC Bioinformatics ; 19(Suppl 11): 360, 2018 Oct 22.
Artigo em Inglês | MEDLINE | ID: mdl-30343669

RESUMO

BACKGROUND: Many biological analysis tasks require extraction of families of genetically similar sequences from large datasets produced by Next-generation Sequencing (NGS). Such tasks include detection of viral transmissions by analysis of all genetically close pairs of sequences from viral datasets sampled from infected individuals or studying of evolution of viruses or immune repertoires by analysis of network of intra-host viral variants or antibody clonotypes formed by genetically close sequences. The most obvious naïeve algorithms to extract such sequence families are impractical in light of the massive size of modern NGS datasets. RESULTS: In this paper, we present fast and scalable k-mer-based framework to perform such sequence similarity queries efficiently, which specifically targets data produced by deep sequencing of heterogeneous populations such as viruses. It shows better filtering quality and time performance when comparing to other tools. The tool is freely available for download at https://github.com/vyacheslav-tsivina/signature-sj CONCLUSION: The proposed tool allows for efficient detection of genetic relatedness between genomic samples produced by deep sequencing of heterogeneous populations. It should be especially useful for analysis of relatedness of genomes of viruses with unevenly distributed variable genomic regions, such as HIV and HCV. For the future we envision, that besides applications in molecular epidemiology the tool can also be adapted to immunosequencing and metagenomics data.


Assuntos
Algoritmos , Variação Genética , Genoma , Filogenia , Sequência de Bases , Entropia , Sequenciamento de Nucleotídeos em Larga Escala , Humanos , Metagenômica , Reprodutibilidade dos Testes , Fatores de Tempo
2.
BMC Bioinformatics ; 19(Suppl 11): 358, 2018 Oct 22.
Artigo em Inglês | MEDLINE | ID: mdl-30343674

RESUMO

BACKGROUND: Molecular surveillance and outbreak investigation are important for elimination of hepatitis C virus (HCV) infection in the United States. A web-based system, Global Hepatitis Outbreak and Surveillance Technology (GHOST), has been developed using Illumina MiSeq-based amplicon sequence data derived from the HCV E1/E2-junction genomic region to enable public health institutions to conduct cost-effective and accurate molecular surveillance, outbreak detection and strain characterization. However, as there are many factors that could impact input data quality to which the GHOST system is not completely immune, accuracy of epidemiological inferences generated by GHOST may be affected. Here, we analyze the data submitted to the GHOST system during its pilot phase to assess the nature of the data and to identify common quality concerns that can be detected and corrected automatically. RESULTS: The GHOST quality control filters were individually examined, and quality failure rates were measured for all samples, including negative controls. New filters were developed and introduced to detect primer dimers, loss of specimen-specific product, or short products. The genotyping tool was adjusted to improve the accuracy of subtype calls. The identification of "chordless" cycles in a transmission network from data generated with known laboratory-based quality concerns allowed for further improvement of transmission detection by GHOST in surveillance settings. Parameters derived to detect actionable common quality control anomalies were incorporated into the automatic quality control module that rejects data depending on the magnitude of a quality problem, and warns and guides users in performing correctional actions. The guiding responses generated by the system are tailored to the GHOST laboratory protocol. CONCLUSIONS: Several new quality control problems were identified in MiSeq data submitted to GHOST and used to improve protection of the system from erroneous data and users from erroneous inferences. The GHOST system was upgraded to include identification of causes of erroneous data and recommendation of corrective actions to laboratory users.


Assuntos
Surtos de Doenças/prevenção & controle , Vigilância da População/métodos , Automação , Técnicas de Genotipagem , Hepacivirus/fisiologia , Hepatite C/epidemiologia , Hepatite C/virologia , Humanos , Controle de Qualidade , Padrões de Referência , Estados Unidos
3.
BMC Genomics ; 18(Suppl 4): 372, 2017 05 24.
Artigo em Inglês | MEDLINE | ID: mdl-28589864

RESUMO

BACKGROUND: Hepatitis C is a major public health problem in the United States and worldwide. Outbreaks of hepatitis C virus (HCV) infections associated with unsafe injection practices, drug diversion, and other exposures to blood are difficult to detect and investigate. Molecular analysis has been frequently used in the study of HCV outbreaks and transmission chains; helping identify a cluster of sequences as linked by transmission if their genetic distances are below a previously defined threshold. However, HCV exists as a population of numerous variants in each infected individual and it has been observed that minority variants in the source are often the ones responsible for transmission, a situation that precludes the use of a single sequence per individual because many such transmissions would be missed. The use of Next-Generation Sequencing immensely increases the sensitivity of transmission detection but brings a considerable computational challenge because all sequences need to be compared among all pairs of samples. METHODS: We developed a three-step strategy that filters pairs of samples according to different criteria: (i) a k-mer bloom filter, (ii) a Levenhstein filter and (iii) a filter of identical sequences. We applied these three filters on a set of samples that cover the spectrum of genetic relationships among HCV cases, from being part of the same transmission cluster, to belonging to different subtypes. RESULTS: Our three-step filtering strategy rapidly removes 85.1% of all the pairwise sample comparisons and 91.0% of all pairwise sequence comparisons, accurately establishing which pairs of HCV samples are below the relatedness threshold. CONCLUSIONS: We present a fast and efficient three-step filtering strategy that removes most sequence comparisons and accurately establishes transmission links of any threshold-based method. This highly efficient workflow will allow a faster response and molecular detection capacity, improving the rate of detection of viral transmissions with molecular data.


Assuntos
Hepacivirus/genética , Hepacivirus/fisiologia , Sequenciamento de Nucleotídeos em Larga Escala , Algoritmos , Estatística como Assunto
4.
BMC Genomics ; 18(Suppl 10): 916, 2017 Dec 06.
Artigo em Inglês | MEDLINE | ID: mdl-29244005

RESUMO

BACKGROUND: Hepatitis C is a major public health problem in the United States and worldwide. Outbreaks of hepatitis C virus (HCV) infections associated with unsafe injection practices, drug diversion, and other exposures to blood are difficult to detect and investigate. Effective HCV outbreak investigation requires comprehensive surveillance and robust case investigation. We previously developed and validated a methodology for the rapid and cost-effective identification of HCV transmission clusters. Global Hepatitis Outbreak and Surveillance Technology (GHOST) is a cloud-based system enabling users, regardless of computational expertise, to analyze and visualize transmission clusters in an independent, accurate and reproducible way. RESULTS: We present and explore performance of several GHOST implemented algorithms using next-generation sequencing data experimentally obtained from hypervariable region 1 of genetically related and unrelated HCV strains. GHOST processes data from an entire MiSeq run in approximately 3 h. A panel of seven specimens was used for preparation of six repeats of MiSeq libraries. Testing sequence data from these libraries by GHOST showed a consistent transmission linkage detection, testifying to high reproducibility of the system. Lack of linkage among genetically unrelated HCV strains and constant detection of genetic linkage between HCV strains from known transmission pairs and from follow-up specimens at different levels of MiSeq-read sampling indicate high specificity and sensitivity of GHOST in accurate detection of HCV transmission. CONCLUSIONS: GHOST enables automatic extraction of timely and relevant public health information suitable for guiding effective intervention measures. It is designed as a virtual diagnostic system intended for use in molecular surveillance and outbreak investigations rather than in research. The system produces accurate and reproducible information on HCV transmission clusters for all users, irrespective of their level of bioinformatics expertise. Improvement in molecular detection capacity will contribute to increasing the rate of transmission detection, thus providing opportunity for rapid, accurate and effective response to outbreaks of hepatitis C. Although GHOST was originally developed for hepatitis C surveillance, its modular structure is readily applicable to other infectious diseases. Worldwide availability of GHOST for the detection of HCV transmissions will foster deeper involvement of public health researchers and practitioners in hepatitis C outbreak investigation.


Assuntos
Computação em Nuvem , Biologia Computacional/métodos , Surtos de Doenças/estatística & dados numéricos , Monitoramento Epidemiológico , Hepatite C/epidemiologia , Internacionalidade , Algoritmos , Humanos , Software , Interface Usuário-Computador
5.
J Infect Dis ; 213(6): 957-65, 2016 Mar 15.
Artigo em Inglês | MEDLINE | ID: mdl-26582955

RESUMO

Hepatitis C is a major public health problem in the United States and worldwide. Outbreaks of hepatitis C virus (HCV) infections are associated with unsafe injection practices, drug diversion, and other exposures to blood and are difficult to detect and investigate. Here, we developed and validated a simple approach for molecular detection of HCV transmissions in outbreak settings. We obtained sequences from the HCV hypervariable region 1 (HVR1), using end-point limiting-dilution (EPLD) technique, from 127 cases involved in 32 epidemiologically defined HCV outbreaks and 193 individuals with unrelated HCV strains. We compared several types of genetic distances and calculated a threshold, using minimal Hamming distances, that identifies transmission clusters in all tested outbreaks with 100% accuracy. The approach was also validated on sequences obtained using next-generation sequencing from HCV strains recovered from 239 individuals, and findings showed the same accuracy as that for EPLD. On average, the nucleotide diversity of the intrahost population was 6.2 times greater in the source case than in any incident case, allowing the correct detection of transmission direction in 8 outbreaks for which source cases were known. A simple and accurate distance-based approach developed here for detecting HCV transmissions streamlines molecular investigation of outbreaks, thus improving the public health capacity for rapid and effective control of hepatitis C.


Assuntos
Surtos de Doenças , Ligação Genética , Hepacivirus/genética , Hepacivirus/isolamento & purificação , Hepatite C/transmissão , Hepatite C/virologia , Análise por Conglomerados , Variação Genética , Genótipo , Hepatite C/epidemiologia , Humanos , Reprodutibilidade dos Testes
6.
J Comput Biol ; 30(4): 502-517, 2023 04.
Artigo em Inglês | MEDLINE | ID: mdl-36716280

RESUMO

With the properties of aggressive cancer and heterogeneous tumor biology, triple-negative breast cancer (TNBC) is a type of breast cancer known for its poor clinical outcome. The lack of estrogen, progesterone, and human epidermal growth factor receptor in the tumors of TNBC leads to fewer treatment options in clinics. The incidence of TNBC is higher in African American (AA) women compared with European American (EA) women with worse clinical outcomes. The significant factors responsible for the racial disparity in TNBC are socioeconomic lifestyle and tumor biology. The current study considered the open-source gene expression data of triple-negative breast cancer samples' racial information. We implemented a state-of-the-art classification Support Vector Machine (SVM) method with a recurrent feature elimination approach to the gene expression data to identify significant biomarkers deregulated in AA women and EA women. We also included Spearman's rho and Ward's linkage method in our feature selection workflow. Our proposed method generates 24 features/genes that can classify the AA and EA samples 98% accurately. We also performed the Kaplan-Meier analysis and log-rank test on the 24 features/genes. We only discussed the correlation between deregulated expression and cancer progression with a poor survival rate of 2 genes, KLK10 and LRRC37A2, out of 24 genes. We believe that further improvement of our method with a higher number of RNA-seq gene expression data will more accurately provide insight into racial disparity in TNBC.


Assuntos
Disparidades nos Níveis de Saúde , Neoplasias de Mama Triplo Negativas , Feminino , Humanos , Biomarcadores Tumorais/genética , Negro ou Afro-Americano/genética , Máquina de Vetores de Suporte , Neoplasias de Mama Triplo Negativas/etnologia , Neoplasias de Mama Triplo Negativas/patologia , Brancos/genética
7.
Infect Genet Evol ; 65: 216-225, 2018 11.
Artigo em Inglês | MEDLINE | ID: mdl-30075255

RESUMO

Human immunodeficiency virus (HIV) infection is rising as a leading cause of morbidity and mortality among hepatitis C virus (HCV)-infected patients. Both viruses interact in co-infected hosts, which may affect their intra-host evolution, potentially leading to differing genetic composition of viral populations in co-infected (CIP) and mono-infected (MIP) patients. Here, we investigate genetic differences between intra-host variants of the HCV hypervariable region 1 (HVR1) sampled from CIP and MIP. Nucleotide (nt) sequences of intra-host HCV HVR1 variants (N = 28,622) obtained from CIP (N = 112) and MIP (n = 176) were represented using 148 physical-chemical (PhyChem) indexes of DNA nt dimers. Significant (p < .0001) differences in the means and frequency distributions of 7 PhyChem properties were found between HVR1 variants from both groups. Linear projection analysis of 29 PhyChem features extracted from such PhyChem properties showed that the CIP and MIP HVR1 variants have a distinct distribution in the modeled 2D-space, with only ~1.3% of PhyChem profiles (N = 6782), shared by all HVR1 variants, being found in both groups. Probabilistic neural network (PNN) and naïve Bayesian (NB) classifiers trained on the PhyChem features accurately classified HVR1 variants by the group in cross-validation experiments (AUROC ≥ 0.96). Similarly, both models showed a high accuracy (AUROC ≥ 0.95) when evaluated on a test dataset of HVR1 sequences obtained from 10 patients, data from whom were not used for model building. Both models performed at the expected lower accuracy on randomly labeled datasets in cross-validation experiments (AUROC = 0.50). The random-label trained PNN showed a similar drop in accuracy on the test dataset (AUROC = 0.48), indicating that the detected associations were unlikely due to random correlations. Marked differences in genetic composition of HCV HVR1 variants sampled from CIP and MIP suggest differing intra-host HCV evolution in the presence of HIV infection. PhyChem features identified here may be used for detection of HIV infection from intra-host HCV variants alone in co-infected patients, thus facilitating monitoring for HIV introduction to high-risk populations with high HCV prevalence.


Assuntos
Infecções por HIV/virologia , Hepacivirus/fisiologia , Hepatite C/virologia , Interações Hospedeiro-Patógeno/fisiologia , Proteínas Virais/genética , Adaptação Biológica/genética , Evolução Biológica , Coinfecção , Biologia Computacional/métodos , Hepacivirus/patogenicidade , Interações Hospedeiro-Patógeno/genética , Humanos , Modelos Teóricos , Proteínas Virais/química
8.
Sci Rep ; 6: 22720, 2016 Mar 04.
Artigo em Inglês | MEDLINE | ID: mdl-26940624

RESUMO

The role of Homeobox transcription factors during fin and limb development have been the focus of recent work investigating the evolutionary origin of limb-specific morphologies. Here we characterize the expression of HoxD genes, as well as the cluster-associated genes Evx2 and LNP, in the paddlefish Polyodon spathula, a basal ray-finned fish. Our results demonstrate a collinear pattern of nesting in early fin buds that includes HoxD14, a gene previously thought to be isolated from global Hox regulation. We also show that in both Polyodon and the catshark Scyliorhinus canicula (a representative chondrichthyan) late phase HoxD transcripts are present in cells of the fin-fold and co-localize with And1, a component of the dermal skeleton. These new data support an ancestral role for HoxD genes in patterning the fin-folds of jawed vertebrates, and fuel new hypotheses about the evolution of cluster regulation and the potential downstream differentiation outcomes of distinct HoxD-regulated compartments.


Assuntos
Nadadeiras de Animais/embriologia , Peixes/embriologia , Regulação da Expressão Gênica no Desenvolvimento , Proteínas de Homeodomínio/biossíntese , Animais , Perfilação da Expressão Gênica , Análise de Sequência de DNA
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA