Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 6 de 6
Filtrar
Mais filtros

Base de dados
País/Região como assunto
Tipo de documento
País de afiliação
Intervalo de ano de publicação
1.
PLoS Comput Biol ; 17(11): e1009534, 2021 11.
Artigo em Inglês | MEDLINE | ID: mdl-34762646

RESUMO

Computational biology has gained traction as an independent scientific discipline over the last years in South America. However, there is still a growing need for bioscientists, from different backgrounds, with different levels, to acquire programming skills, which could reduce the time from data to insights and bridge communication between life scientists and computer scientists. Python is a programming language extensively used in bioinformatics and data science, which is particularly suitable for beginners. Here, we describe the conception, organization, and implementation of the Brazilian Python Workshop for Biological Data. This workshop has been organized by graduate and undergraduate students and supported, mostly in administrative matters, by experienced faculty members since 2017. The workshop was conceived for teaching bioscientists, mainly students in Brazil, on how to program in a biological context. The goal of this article was to share our experience with the 2020 edition of the workshop in its virtual format due to the Coronavirus Disease 2019 (COVID-19) pandemic and to compare and contrast this year's experience with the previous in-person editions. We described a hands-on and live coding workshop model for teaching introductory Python programming. We also highlighted the adaptations made from in-person to online format in 2020, the participants' assessment of learning progression, and general workshop management. Lastly, we provided a summary and reflections from our personal experiences from the workshops of the last 4 years. Our takeaways included the benefits of the learning from learners' feedback (LLF) that allowed us to improve the workshop in real time, in the short, and likely in the long term. We concluded that the Brazilian Python Workshop for Biological Data is a highly effective workshop model for teaching a programming language that allows bioscientists to go beyond an initial exploration of programming skills for data analysis in the medium to long term.


Assuntos
Biologia Computacional/educação , Currículo , Linguagens de Programação , Brasil , COVID-19 , Educação a Distância , Humanos , Pandemias , Distanciamento Físico
2.
PLoS Comput Biol ; 14(4): e1006097, 2018 04.
Artigo em Inglês | MEDLINE | ID: mdl-29684010

RESUMO

Transposable elements (TEs) are repetitive nucleotide sequences that make up a large portion of eukaryotic genomes. They can move and duplicate within a genome, increasing genome size and contributing to genetic diversity within and across species. Accurate identification and classification of TEs present in a genome is an important step towards understanding their effects on genes and their role in genome evolution. We introduce TE-Learner, a framework based on machine learning that automatically identifies TEs in a given genome and assigns a classification to them. We present an implementation of our framework towards LTR retrotransposons, a particular type of TEs characterized by having long terminal repeats (LTRs) at their boundaries. We evaluate the predictive performance of our framework on the well-annotated genomes of Drosophila melanogaster and Arabidopsis thaliana and we compare our results for three LTR retrotransposon superfamilies with the results of three widely used methods for TE identification or classification: RepeatMasker, Censor and LtrDigest. In contrast to these methods, TE-Learner is the first to incorporate machine learning techniques, outperforming these methods in terms of predictive performance, while able to learn models and make predictions efficiently. Moreover, we show that our method was able to identify TEs that none of the above method could find, and we investigated TE-Learner's predictions which did not correspond to an official annotation. It turns out that many of these predictions are in fact strongly homologous to a known TE.


Assuntos
Aprendizado de Máquina , Retroelementos , Sequências Repetidas Terminais , Animais , Arabidopsis/genética , Proteínas de Arabidopsis/genética , Biologia Computacional , Sequência Conservada , DNA de Plantas/genética , Árvores de Decisões , Proteínas de Drosophila/genética , Drosophila melanogaster/genética , Evolução Molecular , Genoma de Inseto , Genoma de Planta , Software
3.
BMC Bioinformatics ; 17(1): 373, 2016 Sep 15.
Artigo em Inglês | MEDLINE | ID: mdl-27627880

RESUMO

BACKGROUND: Hierarchical Multi-Label Classification is a classification task where the classes to be predicted are hierarchically organized. Each instance can be assigned to classes belonging to more than one path in the hierarchy. This scenario is typically found in protein function prediction, considering that each protein may perform many functions, which can be further specialized into sub-functions. We present a new hierarchical multi-label classification method based on multiple neural networks for the task of protein function prediction. A set of neural networks are incrementally training, each being responsible for the prediction of the classes belonging to a given level. RESULTS: The method proposed here is an extension of our previous work. Here we use the neural network output of a level to complement the feature vectors used as input to train the neural network in the next level. We experimentally compare this novel method with several other reduction strategies, showing that it obtains the best predictive performance. Empirical results also show that the proposed method achieves better or comparable predictive performance when compared with state-of-the-art methods for hierarchical multi-label classification in the context of protein function prediction. CONCLUSIONS: The experiments showed that using the output in one level as input to the next level contributed to better classification results. We believe the method was able to learn the relationships between the protein functions during training, and this information was useful for classification. We also identified in which functional classes our method performed better.


Assuntos
Redes Neurais de Computação , Proteínas/fisiologia , Proteínas/classificação , Proteínas/metabolismo
4.
Bioinformatics ; 31(11): 1836-8, 2015 Jun 01.
Artigo em Inglês | MEDLINE | ID: mdl-25638811

RESUMO

Profile hidden Markov models (profile HMMs) are known to efficiently predict whether an amino acid (AA) sequence belongs to a specific protein family. Profile HMMs can also be used to search for protein domains in genome sequences. In this case, HMMs are typically learned from AA sequences and then used to search on the six-frame translation of nucleotide (NT) sequences. However, this approach demands additional processing of the original data and search results. Here, we propose an alternative and more direct method which converts an AA alignment into an NT one, after which an NT-based HMM is trained to be applied directly on a genome.


Assuntos
Genômica/métodos , Alinhamento de Sequência/métodos , Análise de Sequência de Proteína/métodos , Animais , Bactérias/enzimologia , Bactérias/genética , Proteínas Fúngicas/química , Proteínas Fúngicas/genética , Cadeias de Markov , Monoéster Fosfórico Hidrolases/química , Monoéster Fosfórico Hidrolases/genética , Estrutura Terciária de Proteína , Ribonuclease H/química
5.
Gene ; 850: 146917, 2023 Jan 20.
Artigo em Inglês | MEDLINE | ID: mdl-36174905

RESUMO

Among bioluminescent beetles of the Elateroidea superfamily, Phengodidae is the third largest family, with 244 bioluminescent species distributed only in the Americas, but is still the least studied from the phylogenetic and evolutionary points of view. The railroad worm Phrixothrix hirtus is an essential biological model and symbolic species due to its bicolor bioluminescence, being the only organism that produces true red light among bioluminescent terrestrial species. Here, we performed partial genome assembly of P. hirtus, combining short and long reads generated with Illumina sequencing, providing the first source of genomic information and a framework for comparative analyses of the bioluminescent system in Elateroidea. This is the largest genome described in the Elateroidea superfamily, with an estimated size of ∼3.4 Gb, displaying 32 % GC content, and 67 % transposable elements. Comparative genomic analyses showed a positive selection of genes and gene family expansion events of growth and morphogenesis gene products, which could be associated with the atypical anatomical development and morphogenesis found in paedomorphic females and underdeveloped males. We also observed gene family expansion among distinct odorant-binding receptors, which could be associated with the pheromone communication system typical of these beetles, and retrotransposable elements. Common genes putatively regulating bioluminescence production and control, including two luciferase genes corresponding to lateral lanterns green-emitting and head lanterns red-emitting luciferases with 7 exons and 6 introns, and genes potentially involved in luciferin biosynthesis were found, indicating that there are no clear differences about the presence or absence of gene families associated with bioluminescence in Elateroidea.


Assuntos
Besouros , Ferrovias , Animais , Feminino , Filogenia , Elementos de DNA Transponíveis , Odorantes , Besouros/genética , Besouros/metabolismo , Luciferases/metabolismo , Morfogênese , Feromônios
6.
Artigo em Inglês | MEDLINE | ID: mdl-30991174

RESUMO

Bioluminescence, the emission of visible light in a living organism, is an intriguing phenomenon observed in different species and environments. In terrestrial organisms, the bioluminescence is observed mainly in beetles of the Elateroidea superfamily (Coleoptera). Several phylogenetic studies have been used different strategies to propose a scenario for the origin and evolution of the bioluminescence within this group, however some of them showed incongruences, mainly about the relationship of the bioluminescent families. In order to increase the number of molecular markers available for Elateroidea species and to propose a more accurate phylogeny, with high supported topology, we employed the Next-Generation Sequencing (NGS) methodology to perform the RNA-Seq analysis of luminescent (Elateridae, Phengodidae, Rhagophthalmidae, and Lampyridae) and non-luminescent (Cantharidae) species of Neotropical beetles. We used the RNA-Seq data to construct a calibrated phylogeny of Elateroidea superfamily using a large number of nuclear molecular markers. The results indicate Lampyridae and Phengodidae/Rhagophthalmidae as sister-groups, suggesting that the bioluminescence evolved later in Elateridae than other families (Lampyridae, Phengodidae, and Rhagophthalmidae), and indicating the Upper Cretaceous as the period for the main diversification of Elateroidea bioluminescent species.


Assuntos
Besouros/genética , Filogenia , Animais , Evolução Biológica , Genômica , Luminescência , RNA-Seq , Transcriptoma
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA