Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 24
Filtrar
1.
Front Immunol ; 14: 1195533, 2023.
Artigo em Inglês | MEDLINE | ID: mdl-37654488

RESUMO

Background: Pre-existing cross-reactive immunity among different coronaviruses, also termed immune imprinting, may have a comprehensive impact on subsequent SARS-CoV-2 infection and COVID-19 vaccination effectiveness. Here, we aim to explore the interplay between pre-existing seasonal coronaviruses (sCoVs) antibodies and the humoral immunity induced by COVID-19 vaccination. Methods: We first collected serum samples from healthy donors prior to COVID-19 pandemic and individuals who had received COVID-19 vaccination post-pandemic in China, and the levels of IgG antibodies against sCoVs and SARS-CoV-2 were detected by ELISA. Wilcoxon rank sum test and chi-square test were used to compare the difference in magnitude and seropositivity rate between two groups. Then, we recruited a longitudinal cohort to collect serum samples before and after COVID-19 vaccination. The levels of IgG antibodies against SARS-CoV-2 S, S1, S2 and N antigen were monitored. Association between pre-existing sCoVs antibody and COVID-19 vaccination-induced antibodies were analyzed by Spearman rank correlation. Results: 96.0% samples (339/353) showed the presence of IgG antibodies against at least one subtype of sCoVs. 229E and OC43 exhibited the highest seroprevalence rates at 78.5% and 72.0%, respectively, followed by NL63 (60.9%) and HKU1 (52.4%). The levels of IgG antibodies against two ß coronaviruses (OC43 and HKU1) were significantly higher in these donors who had inoculated with COVID-19 vaccines compared to pre-pandemic healthy donors. However, we found that COVID-19 vaccine-induced antibody levels were not significant different between two groups with high levelor low level of pre-existing sCoVs antibody among the longitudinal cohort. Conclusion: We found a high prevalence of antibodies against sCoVs in Chinese population. The immune imprinting by sCoVs could be reactivated by COVID-19 vaccination, but it did not appear to be a major factor affecting the immunogenicity of COVID-19 vaccine. These findings will provide insights into understanding the impact of immune imprinting on subsequent multiple shots of COVID-19 vaccines.


Assuntos
Vacinas contra COVID-19 , COVID-19 , Humanos , Pandemias , Estações do Ano , Estudos Soroepidemiológicos , COVID-19/epidemiologia , COVID-19/prevenção & controle , SARS-CoV-2 , Imunoglobulina G
2.
Emerg Microbes Infect ; 12(2): 2245931, 2023 Dec.
Artigo em Inglês | MEDLINE | ID: mdl-37542407

RESUMO

Yearly epidemics of seasonal influenza cause an enormous disease burden around the globe. An understanding of the rules behind the immune response with repeated vaccination still presents a significant challenge, which would be helpful for optimizing the vaccination strategy. In this study, 34 healthy volunteers with 16 vaccinated were recruited, and the dynamics of the BCR repertoire for consecutive vaccinations in two seasons were tracked. In terms of diversity, length, network, V and J gene segments usage, somatic hypermutation (SHM) rate and isotype, it was found that the overall changes were stronger in the acute phase of the first vaccination than the second vaccination. However, the V gene segments of IGHV4-39, IGHV3-9, IGHV3-7 and IGHV1-69 were amplified in the acute phase of the first vaccination, with IGHV3-7 dominant. On the other hand, for the second vaccination, the changes were dominated by IGHV1-69, with potential for coding broad neutralizing antibody. Additional analysis indicates that the application of V gene segment for IGHV3-7 in the acute phase of the first vaccination was due to the elevated usage of isotypes IgM and IgG3. While for IGHV1-69 in the second vaccination, it was contributed by isotypes IgG1 and IgG2. Finally, 41 public BCR clusters were identified in the vaccine group, with both IGHV3-7 and IGHV1-69 were involved and representative complementarity determining region 3 (CDR3) motifs were characterized. This study provides insights into the immune response dynamics following repeated influenza vaccination in humans and can inform universal vaccine design and vaccine strategies in the future.


Assuntos
Cadeias Pesadas de Imunoglobulinas , Influenza Humana , Humanos , Cadeias Pesadas de Imunoglobulinas/genética , Influenza Humana/prevenção & controle , Influenza Humana/genética , Regiões Determinantes de Complementaridade/genética , Família Multigênica , Vacinação
3.
Nucleic Acids Res ; 51(15): 8005-8019, 2023 08 25.
Artigo em Inglês | MEDLINE | ID: mdl-37283060

RESUMO

Broad-host-range (BHR) plasmids in human gut bacteria are of considerable interest for their ability to mediate horizontal gene transfer (HGT) across large phylogenetic distance. However, the human gut plasmids, especially the BHR plasmids, remain largely unknown. Here, we identified the plasmids in the draft genomes of gut bacterial isolates from Chinese and American donors, resulting in 5372 plasmid-like clusters (PLCs), of which, 820 PLCs (comPLCs) were estimated with > 60% completeness genomes and only 155 (18.9%) were classified to known replicon types (n = 37). We observed that 175 comPLCs had a broad host range across distinct bacterial genera, of which, 71 were detected in at least two human populations of Chinese, American, Spanish, and Danish, and 13 were highly prevalent (>10%) in at least one human population. Haplotype analyses of two widespread PLCs demonstrated their spreading and evolutionary trajectory, suggesting frequent and recent exchanges of the BHR plasmids in environments. In conclusion, we obtained a large collection of plasmid sequences in human gut bacteria and demonstrated that a subset of the BHR plasmids can be transmitted globally, thus facilitating extensive HGT (e.g. antibiotic resistance genes) events. This study highlights the potential implications of the plasmids for global human health.


Assuntos
Microbioma Gastrointestinal , Humanos , Microbioma Gastrointestinal/genética , Filogenia , Especificidade de Hospedeiro , Plasmídeos/genética , Bactérias/genética , Transferência Genética Horizontal/genética
4.
Emerg Microbes Infect ; 11(1): 2007-2020, 2022 Dec.
Artigo em Inglês | MEDLINE | ID: mdl-35899581

RESUMO

Dynamic changes of the paired heavy and light chain B cell receptor (BCR) repertoire provide an essential insight into understanding the humoral immune response post-SARS-CoV-2 infection and vaccination. However, differences between the endogenous paired BCR repertoire kinetics in SARS-CoV-2 infection and previously recovered/naïve subjects treated with the inactivated vaccine remain largely unknown. We performed single-cell V(D)J sequencing of B cells from six healthy donors with three shots of inactivated SARS-CoV-2 vaccine (BBIBP-CorV), five people who received the BBIBP-CorV vaccine after having recovered from COVID-19, five unvaccinated COVID-19 recovered patients and then integrated with public data of B cells from four SARS-CoV-2-infected subjects. We discovered that BCR variable (V) genes were more prominently used in the SARS-CoV-2 exposed groups (both in the group with active infection and in the group that had recovered) than in the vaccinated groups. The VH gene that expanded the most after SARS-CoV-2 infection was IGHV3-33, while IGHV3-23 in the vaccinated groups. SARS-CoV-2-infected group enhanced more BCR clonal expansion and somatic hypermutation than the vaccinated healthy group. A small proportion of public clonotypes were shared between the SARS-CoV-2 infected, vaccinated healthy, and recovered groups. Moreover, several public antibodies had been identified against SARS-CoV-2 spike protein. We comprehensively characterize the paired heavy and light chain BCR repertoire from SARS-CoV-2 infection to vaccination, providing further guidance for the development of the next-generation precision vaccine.


Assuntos
COVID-19 , Vacinas Virais , Anticorpos Antivirais , COVID-19/prevenção & controle , Vacinas contra COVID-19 , Humanos , Receptores de Antígenos de Linfócitos B/genética , SARS-CoV-2/genética , Glicoproteína da Espícula de Coronavírus , Vacinação
5.
Brief Bioinform ; 23(1)2022 01 17.
Artigo em Inglês | MEDLINE | ID: mdl-34953464

RESUMO

Antibodies specifically bind to antigens and are an essential part of the immune system. Hence, antibodies are powerful tools in research and diagnostics. High-throughput sequencing technologies have promoted comprehensive profiling of the immune repertoire, which has resulted in large amounts of antibody sequences that remain to be further analyzed. In this study, antibodies were downloaded from IMGT/LIGM-DB and Sequence Read Archive databases. Contributing features from antibody heavy chains were formulated as numerical inputs and fed into an ensemble machine learning classifier to classify the antigen specificity of six classes of antibodies, namely anti-HIV-1, anti-influenza virus, anti-pneumococcal polysaccharide, anti-citrullinated protein, anti-tetanus toxoid and anti-hepatitis B virus. The classifier was validated using cross-validation and a testing dataset. The ensemble classifier achieved a macro-average area under the receiver operating characteristic curve (AUC) of 0.9246 from the 10-fold cross-validation, and 0.9264 for the testing dataset. Among the contributing features, the contribution of the complementarity-determining regions was 53.1% and that of framework regions was 46.9%, and the amino acid mutation rates occupied the first and second ranks among the top five contributing features. The classifier and insights provided in this study could promote the mechanistic study, isolation and utilization of potential therapeutic antibodies.


Assuntos
Sequência de Aminoácidos , Anticorpos/química , Aprendizado de Máquina , Especificidade de Anticorpos , Regiões Determinantes de Complementaridade , Sequenciamento de Nucleotídeos em Larga Escala , Humanos , Curva ROC
6.
Biomed Res Int ; 2019: 4824909, 2019.
Artigo em Inglês | MEDLINE | ID: mdl-31321235

RESUMO

Recent studies have shown that microorganisms may be associated with the onset and development of bladder cancer. The purpose of this study is to identify the common core bacteria associated with bladder cancer. We characterized the urinary microbial profile of the individuals with bladder cancer by 16S rRNA gene sequencing, and the results of 24 bladder cancer samples collected in our laboratory reveal 31 common core bacteria at genera level. In addition, the abundance of four common core bacteria is significantly higher in bladder cancer samples than in samples from nondiseased people analyzed by LEfSe, based on two previous datasets. In particular, the abundance of Acinetobacter is much higher in bladder cancer samples. It has been reported that Acinetobacter is involved not only in biofilm formation but also in the adhesion and invasion of epithelial cells, the spread of bacteria caused by the degradation of phospholipids in the mucosal barrier, and the escape of the host immune response. Thus, Acinetobacter may be related to bladder cancer and is a potential microbial marker of bladder cancer. However, due to the limited number of participants, further studies are needed to better understand the role of microorganisms in bladder cancer to provide novel biomarkers for diagnosis, prognosis, and therapy.


Assuntos
Acinetobacter/isolamento & purificação , Bactérias/genética , Biomarcadores Tumorais/urina , Neoplasias da Bexiga Urinária/urina , Acinetobacter/genética , Adulto , Idoso , Idoso de 80 Anos ou mais , Bactérias/classificação , Bactérias/isolamento & purificação , Células Epiteliais/microbiologia , Feminino , Sequenciamento de Nucleotídeos em Larga Escala , Humanos , Masculino , Microbiota/genética , Pessoa de Meia-Idade , Invasividade Neoplásica/genética , Invasividade Neoplásica/patologia , Filogenia , RNA Ribossômico 16S/genética , Neoplasias da Bexiga Urinária/microbiologia , Neoplasias da Bexiga Urinária/patologia , Sistema Urinário/microbiologia , Sistema Urinário/patologia
7.
Front Microbiol ; 10: 618, 2019.
Artigo em Inglês | MEDLINE | ID: mdl-30984144

RESUMO

BACKGROUND: Cellulose is the most abundant organic polymer mainly produced by plants in nature. It is insoluble and highly resistant to enzymatic hydrolysis. Cellulolytic microorganisms that are capable of producing a battery of related enzymes play an important role in recycling cellulose-rich plant biomass. Effective cellulose degradation by multiple synergic microorganisms has been observed within a defined microbial consortium in the lab culture. Metagenomic analysis may enable us to understand how microbes cooperate in cellulose degradation in a more complex microbial free-living ecosystem in nature. RESULTS: Here we investigated a typical cellulose-rich and alkaline niche where constituent microbes survive through inter-genera cooperation in cellulose utilization. The niche has been generated in an ancient paper-making plant, which has served as an isolated habitat for over 7 centuries. Combined amplicon-based sequencing of 16S rRNA genes and metagenomic sequencing, our analyses showed a microbial composition with 6 dominant genera including Cloacibacterium, Paludibacter, Exiguobacterium, Acetivibrio, Tolumonas, and Clostridium in this cellulose-rich niche; the composition is distinct from other cellulose-rich niches including a modern paper mill, bamboo soil, wild giant panda guts, and termite hindguts. In total, 11,676 genes of 96 glucoside hydrolase (GH) families, as well as 1,744 genes of carbohydrate transporters were identified, and modeling analysis of two representative genes suggested that these glucoside hydrolases likely evolved to adapt to alkaline environments. Further reconstruction of the microbial draft genomes by binning the assembled contigs predicted a mutualistic interaction between the dominant microbes regarding the cellulolytic process in the niche, with Paludibacter and Clostridium acting as helpers that produce endoglucanases, and Cloacibacterium, Exiguobacterium, Acetivibrio, and Tolumonas being beneficiaries that cross-feed on the cellodextrins by oligosaccharide uptake. CONCLUSION: The analysis of the key genes involved in cellulose degradation and reconstruction of the microbial draft genomes by binning the assembled contigs predicted a mutualistic interaction based on public goods regarding the cellulolytic process in the niche, suggesting that in the studied microbial consortium, free-living bacteria likely survive on each other by acquisition and exchange of metabolites. Knowledge gained from this study will facilitate the design of complex microbial communities with a better performance in industrial bioprocesses.

8.
Sci Rep ; 9(1): 734, 2019 01 24.
Artigo em Inglês | MEDLINE | ID: mdl-30679786

RESUMO

Increasing evidences have revealed a close interaction between the intestinal microbes and host growth performance. The shrimp (Litopenaeus vannamei) gut harbors a diverse microbial community, yet its associations with dietary, body weight and weaning age remain a matter of debate. In this study, we analyzed the effects of different dietary (fishmeal group (NC), krill meal group (KM)) and different growth stages (age from 42 day-old to 98 day-old) of the shrimp on the intestinal microbiota. High throughput sequencing of the 16S rRNA genes of shrimp intestinal microbes determined the novelty of bacteria in the shrimp gut microbiota and a core of 58 Operation Taxonomic Units (OTUs) was present among the shrimp gut samples. Analysis results indicated that the development of the shrimp gut microbiota is a dynamic process with three stages across the age according to the gut microbiota compositions. Furthermore, the dietary of KM group did not significantly change the intestinal microbiota of the shrimps compared with NC group. Intriguingly, compared to NC group, we observed in KM group that a fluctuation of the shrimp gut microbiota coincided with the shrimp body weight gain between weeks 6-7. Six OTUs associated with the microbiota change in KM group were identified. This finding strongly suggests that the shrimp gut microbiota may be correlated with the shrimp body weight likely by influencing nutrient uptake in the gut. The results obtained from this study potentially will be guidelines for manipulation to provide novel shrimp feed management approaches.


Assuntos
Bactérias/genética , Microbioma Gastrointestinal/genética , Penaeidae/microbiologia , Ração Animal/microbiologia , Animais , Aquicultura , Bactérias/classificação , Peso Corporal , Humanos , Penaeidae/genética , RNA Ribossômico 16S/genética
9.
Front Microbiol ; 9: 1476, 2018.
Artigo em Inglês | MEDLINE | ID: mdl-30034378

RESUMO

As an alternative approach against multidrug-resistant bacterial infections, phages are now being increasingly investigated as effective therapeutic agents. Here, aiming to design an efficient phage cocktail against Aeromonas salmonicida infections, we isolated and characterized five lytic A. salmonicida phages, AS-szw, AS-yj, AS-zj, AS-sw, and AS-gz. The results of morphological and genomic analysis suggested that all these phages are affiliated to the T4virus genus of the Caudovirales order. Their heterogeneous lytic capacities against A. salmonicida strains were demonstrated by experiments. A series of phage cocktails were prepared and investigated in vitro. We observed that the cocktail combining AS-gz and AS-yj showed significantly higher antimicrobial activity than other cocktails and individual phages. Given the divergent genomes between the phages AS-yj and AS-gz, our results highlight that the heterogeneous mechanisms that phages use to infect their hosts likely lead to phage synergy in killing the host. Conclusively, our study described a strategy to develop an effective and promising phage cocktail as a therapeutic agent to combat A. salmonicida infections, and thereby to control the outbreak of relevant fish diseases. Our study suggests that in vitro investigations into phages are prerequisite to obtain satisfying phage cocktails prior to application in practice.

10.
Microbiome ; 6(1): 24, 2018 02 01.
Artigo em Inglês | MEDLINE | ID: mdl-29391057

RESUMO

BACKGROUND: Substantial efforts have been made to link the gut bacterial community to many complex human diseases. Nevertheless, the gut phages are often neglected. RESULTS: In this study, we used multiple bioinformatic methods to catalog gut phages from whole-community metagenomic sequencing data of fecal samples collected from both type II diabetes (T2D) patients (n = 71) and normal Chinese adults (n = 74). The definition of phage operational taxonomic units (pOTUs) and identification of large phage scaffolds (n = 2567, ≥ 10 k) revealed a comprehensive human gut phageome with a substantial number of novel sequences encoding genes that were unrelated to those in known phages. Interestingly, we observed a significant increase in the number of gut phages in the T2D group and, in particular, identified 7 pOTUs specific to T2D. This finding was further validated in an independent dataset of 116 T2D and 109 control samples. Co-occurrence/exclusion analysis of the bacterial genera and pOTUs identified a complex core interaction between bacteria and phages in the human gut ecosystem, suggesting that the significant alterations of the gut phageome cannot be explained simply by co-variation with the altered bacterial hosts. CONCLUSIONS: Alterations in the gut bacterial community have been linked to the chronic disease T2D, but the role of gut phages therein is not well understood. This is the first study to identify a T2D-specific gut phageome, indicating the existence of other mechanisms that might govern the gut phageome in T2D patients. These findings suggest the importance of the phageome in T2D risk, which warrants further investigation.


Assuntos
Bactérias/virologia , Bacteriófagos/classificação , Diabetes Mellitus Tipo 2/microbiologia , Trato Gastrointestinal/microbiologia , Bacteriófagos/genética , Bacteriófagos/isolamento & purificação , Estudos de Casos e Controles , China , Biologia Computacional , Fezes/microbiologia , Humanos , Filogenia
11.
J Integr Bioinform ; 14(3)2017 Aug 10.
Artigo em Inglês | MEDLINE | ID: mdl-28796642

RESUMO

Background Miniature inverted repeat transposable element (MITE) is a short transposable element, carrying no protein-coding regions. However, its high proliferation rate and sequence-specific insertion preference renders it as a good genetic tool for both natural evolution and experimental insertion mutagenesis. Recently active MITE copies are those with clear signals of Terminal Inverted Repeats (TIRs) and Direct Repeats (DRs), and are recently translocated into their current sites. Their proliferation ability renders them good candidates for the investigation of genomic evolution. Results This study optimizes the C++ code and running pipeline of the MITE Uncovering SysTem (MUST) by assuming no prior knowledge of MITEs required from the users, and the current version, MUSTv2, shows significantly increased detection accuracy for recently active MITEs, compared with similar programs. The running speed is also significantly increased compared with MUSTv1. We prepared a benchmark dataset, the simulated genome with 150 MITE copies for researchers who may be of interest. Conclusions MUSTv2 represents an accurate detection program of recently active MITE copies, which is complementary to the existing template-based MITE mapping programs. We believe that the release of MUSTv2 will greatly facilitate the genome annotation and structural analysis of the bioOMIC big data researchers.


Assuntos
Elementos de DNA Transponíveis/genética , Sequências Repetidas Invertidas/genética , Software , Genômica/métodos , Anotação de Sequência Molecular
12.
Gene ; 602: 1-7, 2017 Feb 20.
Artigo em Inglês | MEDLINE | ID: mdl-27845204

RESUMO

BACKGROUND: Similar to the regular enzymatic glycosylation, glycation also attaches a sugar molecule to a peptide, but does not need the help of an enzyme. Glycation may occur both inside and outside the host body, and will compete with the glycosylation procedure for functional regulation of mature protein products. The glycated residues do not show significant patterns, which make both in silico sequence-level predictors and wet-lab validations a major challenge. This study hypothesizes that a better feature set formulated from the glycated flanking peptides may lead to a good glycation prediction program. RESULTS: We explored the application of sequence order information and position specific amino acid propensity (PSAAP) in the glycation residue prediction problem. The PSAAP demonstrated its ability to discriminate the glycated residues from the background control peptides. A Support Vector Machine (SVM) model was constructed from the training dataset and achieved 68.91% in the overall accuracy. The model also achieves 0.7258 and 0.3198 in the Area under the ROC and Matthew's Correlation Coefficient, respectively. The user-friendly online version of the proposed algorithm may be found on the web server Gly-PseAAC at http://app.aporc.org/Gly-PseAAC/. CONCLUSION: The feature set PSAAP was calculated and led to a useful classification of glycation residues.


Assuntos
Glicopeptídeos/química , Glicopeptídeos/metabolismo , Lisina/metabolismo , Algoritmos , Sequência de Aminoácidos , Simulação por Computador , Bases de Dados de Proteínas , Glicosilação , Processamento de Proteína Pós-Traducional , Máquina de Vetores de Suporte
13.
Sci Rep ; 6: 32942, 2016 09 06.
Artigo em Inglês | MEDLINE | ID: mdl-27596864

RESUMO

Clustered regularly interspaced short palindromic repeats (CRISPRs) are important genetic elements in many bacterial and archaeal genomes, and play a key role in prokaryote immune systems' fight against invasive foreign elements. The CRISPR system has also been engineered to facilitate target gene editing in eukaryotic genomes. Using the common features of mis-annotated CRISPRs in prokaryotic genomes, this study proposed an accurate de novo CRISPR annotation program CRISPRdigger, which can take a partially assembled genome as its input. A comprehensive comparison with the three existing programs demonstrated that CRISPRdigger can recover more Direct Repeats (DRs) for CRISPRs and achieve a higher accuracy for a query genome. The program was implemented by Perl and all the parameters had default values, so that a user could annotate CRISPRs in a query genome by supplying only a genome sequence in the FASTA format. All the supplementary data are available at http://www.healthinformaticslab.org/supp/.


Assuntos
Sistemas CRISPR-Cas , Clostridium/genética , Repetições Palindrômicas Curtas Agrupadas e Regularmente Espaçadas , Methanocaldococcus/genética , Mapeamento Cromossômico , Bases de Dados de Ácidos Nucleicos , Genoma Arqueal , Genoma Bacteriano , Anotação de Sequência Molecular , Software
14.
Comput Biol Med ; 77: 16-22, 2016 10 01.
Artigo em Inglês | MEDLINE | ID: mdl-27494091

RESUMO

Different therapeutic methods have been developed for the B-cell and T-cell subtypes of acute lymphoblastic leukemia (ALL). The identification of molecular biomarkers that can accurately discriminate between B-cell and T-cell ALLs will facilitate the quick determination of therapeutic plans, as well as reveal the intrinsic mechanisms underlining the two different ALL subtypes. This study computationally screened the high-throughput transcriptome dataset for multiple candidate biomarkers and verified their discrimination abilities in an independent sample set using quantitative real-time polymerase chain reaction (PCR) technology. Both technologies suggest that the two genes CD3D and PKRCQ together provided a good model for classification of B-cell and T-cell ALLs, whereas the individual genes did not show consistent discrimination between the two ALL subtypes. Supplementary material is available at http://healthinformaticslab.org/supp/.


Assuntos
Biomarcadores Tumorais/genética , Complexo CD3/genética , Leucemia-Linfoma Linfoblástico de Células Precursoras B , Leucemia-Linfoma Linfoblástico de Células T Precursoras , Proteína Quinase C-theta/genética , Diagnóstico Diferencial , Perfilação da Expressão Gênica , Humanos , Leucemia-Linfoma Linfoblástico de Células Precursoras B/diagnóstico , Leucemia-Linfoma Linfoblástico de Células Precursoras B/genética , Leucemia-Linfoma Linfoblástico de Células T Precursoras/diagnóstico , Leucemia-Linfoma Linfoblástico de Células T Precursoras/genética , Reação em Cadeia da Polimerase em Tempo Real , Transcriptoma/genética
15.
Biomed Res Int ; 2016: 7237053, 2016.
Artigo em Inglês | MEDLINE | ID: mdl-27195295

RESUMO

Motivation. Clustered regularly interspaced short palindromic repeat (CRISPR) is a genetic element with active regulation roles for foreign invasive genes in the prokaryotic genomes and has been engineered to work with the CRISPR-associated sequence (Cas) gene Cas9 as one of the modern genome editing technologies. Due to inconsistent definitions, the existing CRISPR detection programs seem to have missed some weak CRISPR signals. Results. This study manually curates all the currently annotated CRISPR elements in the prokaryotic genomes and proposes 95 updates to the annotations. A new definition is proposed to cover all the CRISPRs. The comprehensive comparison of CRISPR numbers on the taxonomic levels of both domains and genus shows high variations for closely related species even in the same genus. The detailed investigation of how CRISPRs are evolutionarily manipulated in the 8 completely sequenced species in the genus Thermoanaerobacter demonstrates that transposons act as a frequent tool for splitting long CRISPRs into shorter ones along a long evolutionary history.


Assuntos
Sistemas CRISPR-Cas/genética , Curadoria de Dados/métodos , Evolução Molecular , Células Procarióticas/metabolismo , DNA Intergênico/genética , Bases de Dados de Ácidos Nucleicos , Genoma Bacteriano , Sequências Repetitivas de Ácido Nucleico/genética
16.
BMC Bioinformatics ; 17: 142, 2016 Mar 23.
Artigo em Inglês | MEDLINE | ID: mdl-27006077

RESUMO

BACKGROUND: High-throughput bio-OMIC technologies are producing high-dimension data from bio-samples at an ever increasing rate, whereas the training sample number in a traditional experiment remains small due to various difficulties. This "large p, small n" paradigm in the area of biomedical "big data" may be at least partly solved by feature selection algorithms, which select only features significantly associated with phenotypes. Feature selection is an NP-hard problem. Due to the exponentially increased time requirement for finding the globally optimal solution, all the existing feature selection algorithms employ heuristic rules to find locally optimal solutions, and their solutions achieve different performances on different datasets. RESULTS: This work describes a feature selection algorithm based on a recently published correlation measurement, Maximal Information Coefficient (MIC). The proposed algorithm, McTwo, aims to select features associated with phenotypes, independently of each other, and achieving high classification performance of the nearest neighbor algorithm. Based on the comparative study of 17 datasets, McTwo performs about as well as or better than existing algorithms, with significantly reduced numbers of selected features. The features selected by McTwo also appear to have particular biomedical relevance to the phenotypes from the literature. CONCLUSION: McTwo selects a feature subset with very good classification performance, as well as a small feature number. So McTwo may represent a complementary feature selection algorithm for the high-dimensional biomedical datasets.


Assuntos
Algoritmos , Bases de Dados Factuais , Humanos , Software
17.
Interdiscip Sci ; 7(2): 194-9, 2015 Jun.
Artigo em Inglês | MEDLINE | ID: mdl-26245277

RESUMO

Protein's posttranslational modification (PTM) represents a major dynamic regulation of protein functions after the translation of polypeptide chains from mRNA molecule. Compared with the costly and labor-intensive wet laboratory characterization of PTMs, the computer-based detection of PTM residues has been a major complementary technique in recent years. Previous studies demonstrated that the PTM-flanking positions convey different contributions to the computational detection of PTM residue, but did not directly translate this observation into the in silico PTM prediction. We propose a weight vector to represent the variant contributions of the PTM-flanking positions and use an evolutionary algorithm to optimize the vector. Even a simple nearest neighbor algorithm with the incorporated optimal weight vector outperforms the currently available algorithms. The algorithm is implemented as an easy-to-use computer program, jEcho version 1.0. The implementation language, Java, makes jEcho platform-independent and visually interactive. The predicted results may be directly exported as publication-quality images or text files. jEcho may be downloaded from http://www.healthinformaticslab.org/supp/ .


Assuntos
Motivos de Aminoácidos , Mineração de Dados/métodos , Processamento de Proteína Pós-Traducional , Máquina de Vetores de Suporte , Bases de Dados de Proteínas , Fosforilação , Design de Software
18.
Biomed Res Int ; 2015: 910515, 2015.
Artigo em Inglês | MEDLINE | ID: mdl-26075274

RESUMO

Efficient and intuitive characterization of biological big data is becoming a major challenge for modern bio-OMIC based scientists. Interactive visualization and exploration of big data is proven to be one of the successful solutions. Most of the existing feature selection algorithms do not allow the interactive inputs from users in the optimizing process of feature selection. This study investigates this question as fixing a few user-input features in the finally selected feature subset and formulates these user-input features as constraints for a programming model. The proposed algorithm, fsCoP (feature selection based on constrained programming), performs well similar to or much better than the existing feature selection algorithms, even with the constraints from both literature and the existing algorithms. An fsCoP biomarker may be intriguing for further wet lab validation, since it satisfies both the classification optimization function and the biomedical knowledge. fsCoP may also be used for the interactive exploration of bio-OMIC big data by interactively adding user-defined constraints for modeling.


Assuntos
Algoritmos , Modelos Genéticos , Linguagens de Programação , Biomarcadores , Humanos
19.
Interdiscip Sci ; 2015 Apr 11.
Artigo em Inglês | MEDLINE | ID: mdl-25863965

RESUMO

Protein's post-translational modification (PTM) represents a major dynamic regulation of protein functions after the translation of polypeptide chains from mRNA molecule. Compared with the costly and labor intensive wet lab characterization of PTMs, the computer-based detection of PTM residues has been a major complementary technique in recent years. Previous studies demonstrated that the PTM-flanking positions convey different contributions to the computational detection of PTM residue, but did not directly translate this observation into the in silico PTM prediction. We propose a weight vector to represent the variant contributions of the PTM flanking positions, and use an evolutionary algorithm to optimize the vector. Even a simple nearest neighbor algorithm with the incorporated optimal weight vector outperforms the currently available algorithms. The algorithm is implemented as an easy-to-use computer program, jEcho version 1.0. The implementation language, Java, makes jEcho platform-independent and visually interactive. The predicted results may be directly exported as publication-quality images or text files. jEcho may be downloaded from http://www.healthinformaticslab.org/supp/ .

20.
Adv Exp Med Biol ; 827: 261-74, 2015.
Artigo em Inglês | MEDLINE | ID: mdl-25387969

RESUMO

All the cell types are under strict control of how their genes are transcribed into expressed transcripts by the temporally dynamic orchestration of the transcription factor binding activities. Given a set of known binding sites (BSs) of a given transcription factor (TF), computational TFBS screening technique represents a cost efficient and large scale strategy to complement the experimental ones. There are two major classes of computational TFBS prediction algorithms based on the tertiary and primary structures, respectively. A tertiary structure based algorithm tries to calculate the binding affinity between a query DNA fragment and the tertiary structure of the given TF. Due to the limited number of available TF tertiary structures, primary structure based TFBS prediction algorithm is a necessary complementary technique for large scale TFBS screening. This study proposes a novel evolutionary algorithm to randomly mutate the weights of different positions in the binding motif of a TF, so that the overall TFBS prediction accuracy is optimized. The comparison with the most widely used algorithm, Position Weight Matrix (PWM), suggests that our algorithm performs better or the same level in all the performance measurements, including sensitivity, specificity, accuracy and Matthews correlation coefficient. Our data also suggests that it is necessary to remove the widely used assumption of independence between motif positions. The supplementary material may be found at: http://www.healthinformaticslab.org/supp/ .


Assuntos
Evolução Biológica , Fatores de Transcrição/metabolismo , Algoritmos , Sítios de Ligação
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA