RESUMO
Advances in the next-generation sequencing technology have led to a dramatic decrease in read-generation cost and an increase in read output. Reconstruction of short DNA sequence reads generated by next-generation sequencing requires a read alignment method that reconstructs a reference genome. In addition, it is essential to analyze the results of read alignments for a biologically meaningful inference. However, read alignment from vast amounts of genomic data from various organisms is challenging in that it involves repeated automatic and manual analysis steps. We, here, devised cPlot software for read alignment of nucleotide sequences, with automated read alignment and position analysis, which allows visual assessment of the analysis results by the user. cPlot compares sequence similarity of reads by performing multiple read alignments, with FASTA format files as the input. This application provides a web-based interface for the user for facile implementation, without the need for a dedicated computing environment. cPlot identifies the location and order of the sequencing reads by comparing the sequence to a genetically close reference sequence in a way that is effective for visualizing the assembly of short reads generated by NGS and rapid gene map construction.
Assuntos
Sequenciamento de Nucleotídeos em Larga Escala , Software , Algoritmos , Sequência de Bases , Sequenciamento de Nucleotídeos em Larga Escala/métodos , Alinhamento de Sequência , Análise de Sequência de DNA/métodosRESUMO
SUMMARY: In comparative and evolutionary genomics, a detailed comparison of common features between organisms is essential to evaluate genetic distance. However, identifying differences in matched and mismatched genes among multiple genomes is difficult using current comparative genomic approaches due to complicated methodologies or the generation of meager information from obtained results. This study describes a visualized software tool, geneCo (gene Comparison), for comparing genome structure and gene arrangements between various organisms. User data are aligned, gene information is recognized, and genome structures are compared based on user-defined GenBank files. Information regarding inversion, gain, loss, duplication and gene rearrangement among multiple organisms being compared is provided by geneCo, which uses a web-based interface that users can easily access without any need to consider the computational environment. AVAILABILITY AND IMPLEMENTATION: Users can freely use the software, and the accessible URL is https://bigdata.dongguk.edu/geneCo. The main module of geneCo is implemented by Python and the web-based user interface is built by PHP, HTML and CSS to support all browsers. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
Assuntos
Genoma , Genômica , Bases de Dados de Ácidos Nucleicos , Internet , SoftwareRESUMO
BACKGROUND: The Synurophyceae is one of most important photosynthetic stramenopile algal lineages in freshwater ecosystems. They are characterized by siliceous scales covering the cell or colony surface and possess plastids of red-algal secondary or tertiary endosymbiotic origin. Despite their ecological and evolutionary significance, the relationships amongst extant Synurophyceae are unclear, as is their relationship to most other stramenopiles. RESULTS: Here we report a comparative analysis of plastid genomes sequenced from five representative synurophycean algae. Most of these plastid genomes are highly conserved with respect to genome structure and coding capacity, with the exception of gene re-arrangements and partial duplications at the boundary of the inverted repeat and single-copy regions. Several lineage-specific gene loss/gain events and intron insertions were detected (e.g., cemA, dnaB, syfB, and trnL). CONCLUSIONS: Unexpectedly, the cemA gene of Synurophyceae shows a strong relationship with sequences from members of the green-algal lineage, suggesting the occurrence of a lateral gene transfer event. Using a molecular clock approach based on silica fossil record data, we infer the timing of genome re-arrangement and gene gain/loss events in the plastid genomes of Synurophyceae.
Assuntos
Variação Genética , Genomas de Plastídeos , Genômica , Sequências Repetidas Invertidas/genética , Estramenópilas/genética , Sequência de Bases , DNA Circular/genética , Evolução Molecular , Dosagem de Genes , Conformação de Ácido Nucleico , Filogenia , RNA de Transferência/química , RNA de Transferência/genéticaRESUMO
Summary: Next-generation sequencing (NGS) technologies have led to the accumulation of high-throughput sequence data from various organisms in biology. To apply gene annotation of organellar genomes for various organisms, more optimized tools for functional gene annotation are required. Almost all gene annotation tools are mainly focused on the chloroplast genome of land plants or the mitochondrial genome of animals. We have developed a web application AGORA for the fast, user-friendly and improved annotations of organellar genomes. Annotator for Genes of Organelle from the Reference sequence Analysis (AGORA) annotates genes based on a basic local alignment search tool (BLAST)-based homology search and clustering with selected reference sequences from the NCBI database or user-defined uploaded data. AGORA can annotate the functional genes in almost all mitochondrion and plastid genomes of eukaryotes. The gene annotation of a genome with an exon-intron structure within a gene or inverted repeat region is also available. It provides information of start and end positions of each gene, BLAST results compared with the reference sequence and visualization of gene map by OGDRAW. Availability and implementation: Users can freely use the software, and the accessible URL is https://bigdata.dongguk.edu/gene_project/AGORA/. The main module of the tool is implemented by the python and php, and the web page is built by the HTML and CSS to support all browsers. Supplementary information: Supplementary data are available at Bioinformatics online.
Assuntos
Genoma de Cloroplastos , Genoma Mitocondrial , Sequenciamento de Nucleotídeos em Larga Escala/métodos , Anotação de Sequência Molecular/métodos , Software , Animais , Eucariotos/genética , Análise de Sequência de DNA/métodosRESUMO
Automated protein function prediction defines the designation of functions of unknown protein functions by using computational methods. This technique is useful to automatically assign gene functional annotations for undefined sequences in next generation genome analysis (NGS). NGS is a popular research method since high-throughput technologies such as DNA sequencing and microarrays have created large sets of genes. These huge sequences have greatly increased the need for analysis. Previous research has been based on the similarities of sequences as this is strongly related to the functional homology. However, this study aimed to designate protein functions by automatically predicting the function of the genome by utilizing InterPro (IPR), which can represent the properties of the protein family and groups of the protein function. Moreover, we used gene ontology (GO), which is the controlled vocabulary used to comprehensively describe the protein function. To define the relationship between IPR and GO terms, three pattern recognition techniques have been employed under different conditions, such as feature selection and weighted value, instead of a binary one.
Assuntos
Biologia Computacional/métodos , Ontologia Genética , Anotação de Sequência Molecular/métodos , Análise de Sequência de Proteína/métodosRESUMO
Resource management of the main memory and process handler is critical to enhancing the system performance of a web server. Owing to the transaction delay time that affects incoming requests from web clients, web server systems utilize several web processes to anticipate future requests. This procedure is able to decrease the web generation time because there are enough processes to handle the incoming requests from web browsers. However, inefficient process management results in low service quality for the web server system. Proper pregenerated process mechanisms are required for dealing with the clients' requests. Unfortunately, it is difficult to predict how many requests a web server system is going to receive. If a web server system builds too many web processes, it wastes a considerable amount of memory space, and thus performance is reduced. We propose an adaptive web process manager scheme based on the analysis of web log mining. In the proposed scheme, the number of web processes is controlled through prediction of incoming requests, and accordingly, the web process management scheme consumes the least possible web transaction resources. In experiments, real web trace data were used to prove the improved performance of the proposed scheme.
Assuntos
Algoritmos , Dispositivos de Armazenamento em Computador , Processamento Eletrônico de Dados/métodos , Internet , Mineração de Dados , Fatores de TempoRESUMO
Electrocardiograms (ECGs) provide essential data for diagnosing arrhythmias, which can potentially cause serious health complications. Early detection through continuous monitoring is crucial for timely intervention. The Massachusetts Institute of Technology-Beth Israel Hospital (MIT-BIH) arrhythmia dataset employed for arrhythmia analysis research comprises imbalanced data. It is necessary to create a robust model independent of data imbalances to classify arrhythmias accurately. To mitigate the pronounced class imbalance in the MIT-BIH arrhythmia dataset, this study employs advanced augmentation techniques, specifically variational autoencoder (VAE) and conditional diffusion, to augment the dataset. Furthermore, accurately segmenting the continuous heartbeat dataset into individual heartbeats is crucial for confidently detecting arrhythmias. This research compared a model that employed annotation-based segmentation, utilizing R-peak labels, and a model that utilized an automated segmentation method based on a deep learning model to segment heartbeats. In our experiments, the proposed model, utilizing MobileNetV2 along with annotation-based segmentation and conditional diffusion augmentation to address minority class, demonstrated a notable 1.23% improvement in the F1 score and 1.73% in the precision, compared to the model classifying arrhythmia classes with the original imbalanced dataset. This research presents a model that accurately classifies a wide range of arrhythmias, including minority classes, moving beyond the previously limited arrhythmia classification models. It can serve as a basis for better data utilization and model performance improvement in arrhythmia diagnosis and medical service research. These achievements enhance the applicability in the medical field and contribute to improving the quality of healthcare services by providing more sophisticated and reliable diagnostic tools.
RESUMO
Criteria Based Content Analysis (CBCA) is a forensic tool that analyzes victim statements. It involves the categorization of victims' statements into 19 distinct criteria classifications, playing a crucial role in evaluating the authenticity of testimonies by discerning whether they are rooted in genuine experiences or fabricated accounts. The exclusion of subjective opinions becomes imperative to assess statements through this forensic tool objectively. This study proposes developing an objective classification model for CBCA-based statement analysis using natural language processing techniques. Nevertheless, achieving optimal classification performance proves challenging due to imbalances in data distribution among the various criterion classifications. To enhance the accuracy and reliability of the classification model, this research employs data augmentation techniques and dual contrastive learning methods for fine-tuning the RoBERTa language model. Furthermore, model-based optimization techniques are also applied to identify augmented hyper-parameters and maximize the model's classification performance. The study's findings, including an 8.5% improvement in macro F1 score compared to human classification results, a 24% improvement in macro F1 score, and a 13% improvement in accuracy compared to previous human classification results, suggest that the proposed model is highly effective in reducing the influence of human subjectivity in statement analysis. The proposed model has significant implications for legal proceedings and criminal investigations, as it can provide a more objective and reliable method for evaluating the credibility of victim statements. Reducing human subjectivity in the statement analysis process can increase the accuracy of verdicts and help ensure that justice is served.
RESUMO
BACKGROUND: Research on the acceptance of cosmetic surgery has focused on relatively affluent Western samples, to the exclusion of non-Western samples and any potential cross-cultural differences. While rates of cosmetic surgery in South Korea have risen sharply in the past decade, mirroring rates in other East Asian nations, little is known about attitudes toward cosmetic surgery in the Korean population. OBJECTIVES: To examine the factor structure and correlates of a Korean adaptation of the previously-published Acceptance of Cosmetic Surgery Scale (ACSS). METHODS: South Korean university students (N = 267) completed the ACSS, as well as included Korean translations of measures for actual vs. ideal body weight discrepancy, body appreciation, sociocultural attitudes toward appearance, and demographics. RESULTS: The Korean ACSS reduced to a two-factor solution, mirroring results among other non-Western samples, although a one-factor solution was deemed more plausible. Compared to men, women had significantly higher total scores, suggesting that they were more accepting of cosmetic surgery. A multiple regression showed that, after controlling for the effects of participant sex, the only significant predictor of acceptance of cosmetic surgery was general body appreciation, suggesting that some may view cosmetic surgery as a means of enhancing their body image. CONCLUSIONS: The results reveal important global information for plastic surgeons-not only on the treatment of non-Western patients but on the South Korean market, in which the cosmetic surgery industry remains unregulated. Given the popularity and acceptance of cosmetic surgery in South Korea, there is an urgent need for regulatory intervention to ensure patient safety and satisfaction.
Assuntos
Povo Asiático/psicologia , Conhecimentos, Atitudes e Prática em Saúde/etnologia , Aceitação pelo Paciente de Cuidados de Saúde/etnologia , Procedimentos de Cirurgia Plástica/psicologia , Estudantes/psicologia , Cirurgia Plástica/psicologia , Universidades , Adolescente , Adulto , Imagem Corporal , Distribuição de Qui-Quadrado , Características Culturais , Análise Fatorial , Feminino , Comportamentos Relacionados com a Saúde/etnologia , Humanos , Masculino , Análise de Regressão , República da Coreia/epidemiologia , Autoimagem , Comportamento Social , Inquéritos e Questionários , Adulto JovemRESUMO
Protein function prediction is a crucial part of genome annotation. Prediction methods have recently witnessed rapid development, owing to the emergence of high-throughput sequencing technologies. Among the available databases for identifying protein function terms, Gene Ontology (GO) is an important resource that describes the functional properties of proteins. Researchers are employing various approaches to efficiently predict the GO terms. Meanwhile, deep learning, a fast-evolving discipline in data-driven approach, exhibits impressive potential with respect to assigning GO terms to amino acid sequences. Herein, we reviewed the currently available computational GO annotation methods for proteins, ranging from conventional to deep learning approach. Further, we selected some suitable predictors from among the reviewed tools and conducted a mini comparison of their performance using a worldwide challenge dataset. Finally, we discussed the remaining major challenges in the field, and emphasized the future directions for protein function prediction with GO.
RESUMO
The massively parallel nature of next-generation sequencing technologies has contributed to the generation of massive sequence data in the last two decades. Deciphering the meaning of each generated sequence requires multiple analysis tools, at all stages of analysis, from the reads stage all the way up to the whole-genome level. Homology-based approaches based on related reference sequences are usually the preferred option for gene and transcript prediction in newly sequenced genomes, resulting in the popularity of a variety of BLAST and BLAST-based tools. For organelle genomes, a single-reference-based gene finding tool that uses grouping parameters for BLAST results has been implemented in the Genome Search Plotter (GSP). However, this tool does not accept multiple and user-customized reference sequences required for a broad homology search. Here, we present multiple Reference-based Gene Search and Plot (ReGSP), a simple and convenient web tool that accepts multiple reference sequences for homology-based gene search. The tool incorporates cPlot, a novel dot plot tool, for illustrating nucleotide sequence similarity between the query and the reference sequences. ReGSP has an easy-to-use web interface and is freely accessible at https://ds.mju.ac.kr/regsp.
RESUMO
BACKGROUND: Automated protein function prediction methods are the only practical approach for assigning functions to genes obtained from model organisms. Many of the previously reported function annotation methods are of limited utility for fungal protein annotation. They are often trained only to one species, are not available for high-volume data processing, or require the use of data derived by experiments such as microarray analysis. To meet the increasing need for high throughput, automated annotation of fungal genomes, we have developed a tool for annotating fungal protein sequences with terms from the Gene Ontology. RESULTS: We describe a classifier called PoGO (Prediction of Gene Ontology terms) that uses statistical pattern recognition methods to assign Gene Ontology (GO) terms to proteins from filamentous fungi. PoGO is organized as a meta-classifier in which each evidence source (sequence similarity, protein domains, protein structure and biochemical properties) is used to train independent base-level classifiers. The outputs of the base classifiers are used to train a meta-classifier, which provides the final assignment of GO terms. An independent classifier is trained for each GO term, making the system amenable to updating, without having to re-train the whole system. The resulting system is robust. It provides better accuracy and can assign GO terms to a higher percentage of unannotated protein sequences than other methods that we tested. CONCLUSIONS: Our annotation system overcomes many of the shortcomings that we found in other methods. We also provide a web server where users can submit protein sequences to be annotated.
Assuntos
Proteínas Fúngicas/química , Proteínas Fúngicas/genética , Genoma Fúngico , Genômica/métodos , Software , Bases de Dados de Proteínas , Conformação Proteica , Alinhamento de Sequência , Análise de Sequência de Proteína , Relação Estrutura-AtividadeRESUMO
Next-Generation Sequencing (NGS) has made it easier to obtain genome-wide sequence data and it has shifted the research focus into genome annotation. The challenging tasks involved in annotation rely on the currently available tools and techniques to decode the information contained in nucleotide sequences. This information will improve our understanding of general aspects of life and evolution and improve our ability to diagnose genetic disorders. Here, we present a summary of both structural and functional annotations, as well as the associated comparative annotation tools and pipelines. We highlight visualization tools that immensely aid the annotation process and the contributions of the scientific community to the annotation. Further, we discuss quality-control practices and the need for re-annotation, and highlight the future of annotation.
RESUMO
Standing water and sediments remaining on flood-affected materials were the breeding ground for many microorganisms in flooded homes following Hurricane Katrina. The purpose of this laboratory study was to examine the aerosolization of culturable and total fungi, (1-->3)-beta-D glucan, and endotoxin from eight flood-affected floor and bedding materials collected in New Orleans homes, following Hurricane Katrina. Aerosolization was examined using the Fungal Spore Source Strength Tester (FSSST) connected to a BioSampler. Dust samples were collected by vacuuming. A two-stage cyclone sampler was used for size-selective analysis of aerosolized glucan and endotoxin. On average, levels of culturable fungi ranged from undetectable (lower limit=8.3 x 10(4)) to 2.6 x 10(5) CFU/m(2); total fungi ranged from 2.07 x 10(5) to 1.6 x 10(6) spores/m(2); (1-->3)-beta-D glucan and endotoxin were 2.0 x 10(3) - 2.9 x 10(4) ng/m(2) and 7.0 x 10(2) - 9.3 x 10(4) EU/m(2), respectively. The results showed that 5-15 min sampling is sufficient for detecting aerosolizable biocontaminants with the FSSST. Smaller particle size fractions (<1.0 and <1.8 microm) have levels of glucan and endotoxin comparable to larger (>1.8 microm) fractions, which raises additional exposure concerns. Vacuuming was found to overestimate inhalation exposure risks by a factor of approximately 10(2) for (1-->3)-beta-D glucan and by 10(3)-10(4) for endotoxin as detected by the FSSST. The information generated from this study is important with respect to restoration and rejuvenation of the flood-affected areas in New Orleans. We believe the findings will be significant during similar disasters in other regions of the world including major coastal floods from tsunamis.
Assuntos
Poluentes Atmosféricos/análise , Endotoxinas/isolamento & purificação , Inundações , Fungos/isolamento & purificação , Habitação/normas , beta-Glucanas/isolamento & purificação , Aerossóis , Microbiologia do Ar/normas , Desastres , Monitoramento Ambiental/instrumentação , Monitoramento Ambiental/métodos , Pisos e Cobertura de Pisos/normas , Nova Orleans , Proteoglicanas , Poluentes da Água/análiseRESUMO
Transmucosal drug delivery (TMD) system using mucoadhesive polymer has been recently interested due to the rapid onset of action, high blood level, avoidance of the first-pass effect and the exposure of the drug to the gastrointestinal tract. A novel mucoadhesive polymer complex composed of chitosan and poly(acrylic acid) (PAA) was prepared by template polymerization of acrylic acid in the presence of chitosan for the TMD system. Triamcinolone acetonide (TAA) was loaded into the chitosan/PAA polymer complex film. TAA was evenly dispersed in chitosan, PAA polymer complex film without interaction with polymer complex. Release behavior of TAA from the mucoadhesive polymer film was dependent on time, pH, loading content of drug, and chitosan PAA ratio. The analysis of the drug release from the mucoadhesive film showed that TAA might be released from the chitosan/PAA polymer complex film through non-Fickian diffusion mechanism.
Assuntos
Resinas Acrílicas/química , Materiais Biocompatíveis/farmacocinética , Quitina/química , Triancinolona Acetonida/farmacocinética , Quitina/análogos & derivados , Quitosana , Cromatografia Líquida de Alta Pressão , Concentração de Íons de Hidrogênio , Plásticos/química , Polímeros/química , Espectroscopia de Infravermelho com Transformada de Fourier , Fatores de Tempo , Aderências Teciduais , Difração de Raios XAssuntos
Atitude , Cosméticos , Preconceito , Estudantes/psicologia , Universidades , Adolescente , Adulto , Feminino , Humanos , MasculinoRESUMO
Identifying genomic regions that descended from a common ancestor helps us study the gene function and genome evolution. In distantly related genomes, clusters of homologous gene pairs are evidently used in function prediction, operon detection, etc. Currently, there are many kinds of computational methods that have been proposed defining gene clusters to identify gene families and operons. However, most of those algorithms are only available on a data set of small size. We developed an efficient gene clustering algorithm that can be applied on hundreds of genomes at the same time. This approach allows for large-scale study of evolutionary relationships of gene clusters and study of operon formation and destruction. An analysis of proposed algorithms shows that more biological insight can be obtained by analyzing gene clusters across hundreds of genomes, which can help us understand operon occurrences, gene orientations and gene rearrangements.
RESUMO
This study reports results from the first International Body Project (IBP-I), which surveyed 7,434 individuals in 10 major world regions about body weight ideals and body dissatisfaction. Participants completed the female Contour Drawing Figure Rating Scale (CDFRS) and self-reported their exposure to Western and local media. Results indicated there were significant cross-regional differences in the ideal female figure and body dissatisfaction, but effect sizes were small across high-socioeconomic-status (SES) sites. Within cultures, heavier bodies were preferred in low-SES sites compared to high-SES sites in Malaysia and South Africa (ds = 1.94-2.49) but not in Austria. Participant age, body mass index (BMI), and Western media exposure predicted body weight ideals. BMI and Western media exposure predicted body dissatisfaction among women. Our results show that body dissatisfaction and desire for thinness is commonplace in high-SES settings across world regions, highlighting the need for international attention to this problem.