Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 20
Filtrar
1.
Bioinformatics ; 40(Supplement_1): i11-i19, 2024 Jun 28.
Artigo em Inglês | MEDLINE | ID: mdl-38940154

RESUMO

MOTIVATION: Wikipedia is a vital open educational resource in computational biology. The quality of computational biology coverage in English-language Wikipedia has improved steadily in recent years. However, there is an increasingly large 'knowledge gap' between computational biology resources in English-language Wikipedia, and Wikipedias in non-English languages. Reducing this knowledge gap by providing educational resources in non-English languages would reduce language barriers which disadvantage non-native English speaking learners across multiple dimensions in computational biology. RESULTS: Here, we provide a comprehensive assessment of computational biology coverage in Spanish-language Wikipedia, the second most accessed Wikipedia worldwide. Using Spanish-language Wikipedia as a case study, we generate quantitative and qualitative data before and after a targeted educational event, specifically, a Spanish-focused student editing competition. Our data demonstrates how such events and activities can narrow the knowledge gap between English and non-English educational resources, by improving existing articles and creating new articles. Finally, based on our analysis, we suggest ways to prioritize future initiatives to improve open educational resources in other languages. AVAILABILITY AND IMPLEMENTATION: Scripts for data analysis are available at: https://github.com/ISCBWikiTeam/spanish.


Assuntos
Biologia Computacional , Biologia Computacional/métodos , Humanos , Idioma , Internet
2.
Bioinformatics ; 35(14): i127-i135, 2019 07 15.
Artigo em Inglês | MEDLINE | ID: mdl-31510667

RESUMO

MOTIVATION: Sequence alignment is a central operation in bioinformatics pipeline and, despite many improvements, remains a computationally challenging problem. Locality-sensitive hashing (LSH) is one method used to estimate the likelihood of two sequences to have a proper alignment. Using an LSH, it is possible to separate, with high probability and relatively low computation, the pairs of sequences that do not have high-quality alignment from those that may. Therefore, an LSH reduces the overall computational requirement while not introducing many false negatives (i.e. omitting to report a valid alignment). However, current LSH methods treat sequences as a bag of k-mers and do not take into account the relative ordering of k-mers in sequences. In addition, due to the lack of a practical LSH method for edit distance, in practice, LSH methods for Jaccard similarity or Hamming similarity are used as a proxy. RESULTS: We present an LSH method, called Order Min Hash (OMH), for the edit distance. This method is a refinement of the minHash LSH used to approximate the Jaccard similarity, in that OMH is sensitive not only to the k-mer contents of the sequences but also to the relative order of the k-mers in the sequences. We present theoretical guarantees of the OMH as a gapped LSH. AVAILABILITY AND IMPLEMENTATION: The code to generate the results is available at http://github.com/Kingsford-Group/omhismb2019. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Assuntos
Algoritmos , Alinhamento de Sequência , Software
3.
Bioinformatics ; 34(13): i13-i22, 2018 07 01.
Artigo em Inglês | MEDLINE | ID: mdl-29949995

RESUMO

Motivation: The minimizers technique is a method to sample k-mers that is used in many bioinformatics software to reduce computation, memory usage and run time. The number of applications using minimizers keeps on growing steadily. Despite its many uses, the theoretical understanding of minimizers is still very limited. In many applications, selecting as few k-mers as possible (i.e. having a low density) is beneficial. The density is highly dependent on the choice of the order on the k-mers. Different applications use different orders, but none of these orders are optimal. A better understanding of minimizers schemes, and the related local and forward schemes, will allow designing schemes with lower density and thereby making existing and future bioinformatics tools even more efficient. Results: From the analysis of the asymptotic behavior of minimizers, forward and local schemes, we show that the previously believed lower bound on minimizers schemes does not hold, and that schemes with density lower than thought possible actually exist. The proof is constructive and leads to an efficient algorithm to compare k-mers. These orders are the first known orders that are asymptotically optimal. Additionally, we give improved bounds on the density achievable by the three type of schemes.


Assuntos
Algoritmos , Biologia Computacional/métodos
4.
PLoS Comput Biol ; 14(1): e1005802, 2018 01.
Artigo em Inglês | MEDLINE | ID: mdl-29346365

RESUMO

Education and training are two essential ingredients for a successful career. On one hand, universities provide students a curriculum for specializing in one's field of study, and on the other, internships complement coursework and provide invaluable training experience for a fruitful career. Consequently, undergraduates and graduates are encouraged to undertake an internship during the course of their degree. The opportunity to explore one's research interests in the early stages of their education is important for students because it improves their skill set and gives their career a boost. In the long term, this helps to close the gap between skills and employability among students across the globe and balance the research capacity in the field of computational biology. However, training opportunities are often scarce for computational biology students, particularly for those who reside in less-privileged regions. Aimed at helping students develop research and academic skills in computational biology and alleviating the divide across countries, the Student Council of the International Society for Computational Biology introduced its Internship Program in 2009. The Internship Program is committed to providing access to computational biology training, especially for students from developing regions, and improving competencies in the field. Here, we present how the Internship Program works and the impact of the internship opportunities so far, along with the challenges associated with this program.


Assuntos
Biologia Computacional/educação , Internato e Residência , Algoritmos , Austrália , Currículo , Países em Desenvolvimento , Europa (Continente) , Geografia , Humanos , Desenvolvimento de Programas , Estudantes , Universidades
5.
BMC Bioinformatics ; 19(Suppl 12): 347, 2018 Oct 09.
Artigo em Inglês | MEDLINE | ID: mdl-30301451

RESUMO

This article describes the motivation, origin and evolution of the student symposia series organised by the ISCB Student Council. The meeting series started thirteen years ago in Madrid and has spread to four continents. The article concludes with the highlights of the most recent edition of annual Student Council Symposium held in conjunction with the 25th Conference on Intelligent Systems for Molecular Biology and the 16th European Conference on Computational Biology, in Prague, in July 2017.


Assuntos
Biologia Computacional , Congressos como Assunto , Estudantes , Bolsas de Estudo , Humanos , Revisão da Pesquisa por Pares , Publicações , Apoio à Pesquisa como Assunto/economia
6.
BMC Bioinformatics ; 16 Suppl 2: A1-10, 2015.
Artigo em Inglês | MEDLINE | ID: mdl-25708534

RESUMO

This report summarizes the scientific content and activities of the annual symposium organized by the Student Council of the International Society for Computational Biology (ISCB), held in conjunction with the Intelligent Systems for Molecular Biology (ISMB) conference in Boston, USA, on July 11th, 2014.


Assuntos
Biologia Computacional , Resistência a Múltiplos Medicamentos , Sequenciamento de Nucleotídeos em Larga Escala , Repetições de Microssatélites/genética , Revisão da Pesquisa por Pares , Editoração , RNA Mensageiro/metabolismo , Análise de Sequência de DNA
7.
J Comput Biol ; 31(7): 597-615, 2024 Jul.
Artigo em Inglês | MEDLINE | ID: mdl-38980804

RESUMO

Most sequence sketching methods work by selecting specific k-mers from sequences so that the similarity between two sequences can be estimated using only the sketches. Because estimating sequence similarity is much faster using sketches than using sequence alignment, sketching methods are used to reduce the computational requirements of computational biology software. Applications using sketches often rely on properties of the k-mer selection procedure to ensure that using a sketch does not degrade the quality of the results compared with using sequence alignment. Two important examples of such properties are locality and window guarantees, the latter of which ensures that no long region of the sequence goes unrepresented in the sketch. A sketching method with a window guarantee, implicitly or explicitly, corresponds to a decycling set of the de Bruijn graph, which is a set of unavoidable k-mers. Any long enough sequence, by definition, must contain a k-mer from any decycling set (hence, the unavoidable property). Conversely, a decycling set also defines a sketching method by choosing the k-mers from the set as representatives. Although current methods use one of a small number of sketching method families, the space of decycling sets is much larger and largely unexplored. Finding decycling sets with desirable characteristics (e.g., small remaining path length) is a promising approach to discovering new sketching methods with improved performance (e.g., with small window guarantee). The Minimum Decycling Sets (MDSs) are of particular interest because of their minimum size. Only two algorithms, by Mykkeltveit and Champarnaud, are previously known to generate two particular MDSs, although there are typically a vast number of alternative MDSs. We provide a simple method to enumerate MDSs. This method allows one to explore the space of MDSs and to find MDSs optimized for desirable properties. We give evidence that the Mykkeltveit sets are close to optimal regarding one particular property, the remaining path length. A number of conjectures and computational and theoretical evidence to support them are presented. Code available at https://github.com/Kingsford-Group/mdsscope.


Assuntos
Algoritmos , Biologia Computacional , Software , Biologia Computacional/métodos , Alinhamento de Sequência/métodos , Humanos , Análise de Sequência de DNA/métodos
8.
ArXiv ; 2023 Nov 06.
Artigo em Inglês | MEDLINE | ID: mdl-37986724

RESUMO

Most sequence sketching methods work by selecting specific k-mers from sequences so that the similarity between two sequences can be estimated using only the sketches. Because estimating sequence similarity is much faster using sketches than using sequence alignment, sketching methods are used to reduce the computational requirements of computational biology software packages. Applications using sketches often rely on properties of the k-mer selection procedure to ensure that using a sketch does not degrade the quality of the results compared with using sequence alignment. Two important examples of such properties are locality and window guarantees, the latter of which ensures that no long region of the sequence goes unrepresented in the sketch. A sketching method with a window guarantee, implicitly or explicitly, corresponds to a Decycling Set, an unavoidable sets of k-mers. Any long enough sequence, by definition, must contain a k-mer from any decycling set (hence, it is unavoidable). Conversely, a decycling set also defines a sketching method by choosing the k-mers from the set as representatives. Although current methods use one of a small number of sketching method families, the space of decycling sets is much larger, and largely unexplored. Finding decycling sets with desirable characteristics (e.g., small remaining path length) is a promising approach to discovering new sketching methods with improved performance (e.g., with small window guarantee). The Minimum Decycling Sets (MDSs) are of particular interest because of their minimum size. Only two algorithms, by Mykkeltveit and Champarnaud, are previously known to generate two particular MDSs, although there are typically a vast number of alternative MDSs. We provide a simple method to enumerate MDSs. This method allows one to explore the space of MDSs and to find MDSs optimized for desirable properties. We give evidence that the Mykkeltveit sets are close to optimal regarding one particular property, the remaining path length. A number of conjectures and computational and theoretical evidence to support them are presented. Code available at https://github.com/Kingsford-Group/mdsscope.

9.
J Comput Biol ; 27(8): 1181-1189, 2020 08.
Artigo em Inglês | MEDLINE | ID: mdl-32315544

RESUMO

Computational tools used for genomic analyses are becoming more accurate but also increasingly sophisticated and complex. This introduces a new problem in that these pieces of software have a large number of tunable parameters that often have a large influence on the results that are reported. We quantify the impact of parameter choice on transcript assembly and take some first steps toward generating a truly automated genomic analysis pipeline by developing a method for automatically choosing input-specific parameter values for reference-based transcript assembly using the Scallop tool. By choosing parameter values for each input, the area under the receiver operator characteristic curve (AUC) when comparing assembled transcripts to a reference transcriptome is increased by an average of 28.9% over using only the default parameter choices on 1595 RNA-Seq samples in the Sequence Read Archive. This approach is general, and when applied to StringTie, it increases the AUC by an average of 13.1% on a set of 65 RNA-Seq experiments from ENCODE. Parameter advisors for both Scallop and StringTie are available on Github.


Assuntos
Biologia Computacional/tendências , Genoma/genética , Análise de Sequência de RNA/métodos , Software , Algoritmos , Genômica , Anotação de Sequência Molecular , RNA/genética , Transcriptoma/genética
10.
F1000Res ; 82019.
Artigo em Inglês | MEDLINE | ID: mdl-30647915

RESUMO

The Student Council of the International Society for Computational Biology (ISCB-SC) is a student-focused organization for researchers from all early career levels of training (undergraduates, masters, PhDs and postdocs) that organizes bioinformatics and computational biology activities across the globe. Among its activities, the ISCB-SC organizes several symposia in different continents, many times, with the help of the Regional Student Groups (RSGs) that are based on each region. In this editorial we highlight various key moments and learned lessons from the 14th Student Council Symposium (SCS, Chicago, USA), the 5th European Student Council Symposium (ESCS, Athens, Greece) and the 3rd Latin American Student Council Symposium (LA-SCS, Viña del Mar, Chile).


Assuntos
Biologia Computacional , Liderança , Estudantes , Chile , Humanos , Pesquisadores
11.
F1000Res ; 82019.
Artigo em Inglês | MEDLINE | ID: mdl-31508204

RESUMO

Regional Student Groups (RSGs) of the International Society for Computational Biology Student Council (ISCB-SC) have been instrumental to connect computational biologists globally and to create more awareness about bioinformatics education. This article highlights the initiatives carried out by the RSGs both nationally and internationally to strengthen the present and future of the bioinformatics community. Moreover, we discuss the future directions the organization will take and the challenges to advance further in the ISCB-SC main mission: "Nurture the new generation of computational biologists".


Assuntos
Biologia Computacional , Estudantes , Humanos , Relações Interprofissionais
12.
J Comput Biol ; 25(7): 780-793, 2018 07.
Artigo em Inglês | MEDLINE | ID: mdl-29889553

RESUMO

While mutation rates can vary markedly over the residues of a protein, multiple sequence alignment tools typically use the same values for their scoring-function parameters across a protein's entire length. We present a new approach, called adaptive local realignment, that in contrast automatically adapts to the diversity of mutation rates along protein sequences. This builds upon a recent technique known as parameter advising, which finds global parameter settings for an aligner, to now adaptively find local settings. Our approach in essence identifies local regions with low estimated accuracy, constructs a set of candidate realignments using a carefully-chosen collection of parameter settings, and replaces the region if a realignment has higher estimated accuracy. This new method of local parameter advising, when combined with prior methods for global advising, boosts alignment accuracy as much as 26% over the best default setting on hard-to-align protein benchmarks, and by 6.4% over global advising alone. Adaptive local realignment has been implemented within the Opal aligner using the Facet accuracy estimator.


Assuntos
Biologia Computacional , Proteínas/genética , Software , Algoritmos , Sequência de Aminoácidos/genética , Alinhamento de Sequência
13.
IEEE/ACM Trans Comput Biol Bioinform ; 14(5): 1028-1041, 2017.
Artigo em Inglês | MEDLINE | ID: mdl-28991725

RESUMO

While the multiple sequence alignment output by an aligner strongly depends on the parameter values used for the alignment scoring function (such as the choice of gap penalties and substitution scores), most users rely on the single default parameter setting provided by the aligner. A different parameter setting, however, might yield a much higher-quality alignment for the specific set of input sequences. The problem of picking a good choice of parameter values for specific input sequences is called parameter advising. A parameter advisor has two ingredients: (i) a set of parameter choices to select from, and (ii) an estimator that provides an estimate of the accuracy of the alignment computed by the aligner using a parameter choice. The parameter advisor picks the parameter choice from the set whose resulting alignment has highest estimated accuracy. In this paper, we consider for the first time the problem of learning the optimal set of parameter choices for a parameter advisor that uses a given accuracy estimator. The optimal set is one that maximizes the expected true accuracy of the resulting parameter advisor, averaged over a collection of training data. While we prove that learning an optimal set for an advisor is NP-complete, we show there is a natural approximation algorithm for this problem, and prove a tight bound on its approximation ratio. Experiments with an implementation of this approximation algorithm on biological benchmarks, using various accuracy estimators from the literature, show it finds sets for advisors that are surprisingly close to optimal. Furthermore, the resulting parameter advisors are significantly more accurate in practice than simply aligning with a single default parameter choice.


Assuntos
Algoritmos , Aprendizado de Máquina , Alinhamento de Sequência/métodos , Análise de Sequência de Proteína/métodos , Biologia Computacional
14.
Algorithms Mol Biol ; 12: 11, 2017.
Artigo em Inglês | MEDLINE | ID: mdl-28435440

RESUMO

BACKGROUND: In a computed protein multiple sequence alignment, the coreness of a column is the fraction of its substitutions that are in so-called core columns of the gold-standard reference alignment of its proteins. In benchmark suites of protein reference alignments, the core columns of the reference alignment are those that can be confidently labeled as correct, usually due to all residues in the column being sufficiently close in the spatial superposition of the known three-dimensional structures of the proteins. Typically the accuracy of a protein multiple sequence alignment that has been computed for a benchmark is only measured with respect to the core columns of the reference alignment. When computing an alignment in practice, however, a reference alignment is not known, so the coreness of its columns can only be predicted. RESULTS: We develop for the first time a predictor of column coreness for protein multiple sequence alignments. This allows us to predict which columns of a computed alignment are core, and hence better estimate the alignment's accuracy. Our approach to predicting coreness is similar to nearest-neighbor classification from machine learning, except we transform nearest-neighbor distances into a coreness prediction via a regression function, and we learn an appropriate distance function through a new optimization formulation that solves a large-scale linear programming problem. We apply our coreness predictor to parameter advising, the task of choosing parameter values for an aligner's scoring function to obtain a more accurate alignment of a specific set of sequences. We show that for this task, our predictor strongly outperforms other column-confidence estimators from the literature, and affords a substantial boost in alignment accuracy.

15.
F1000Res ; 62017.
Artigo em Inglês | MEDLINE | ID: mdl-29333232

RESUMO

Student Council Symposiums (SCSs) have been found to be very useful for students and young researchers. This is especially true given that the events are held directly before large international conferences, giving attendees a chance to gain exposure and have a warm up to the social nuances involved in attending such a meeting. This was the second SCS held in Africa in conjunction with the International Society for Computational Biology (ISCB) and the African Society for Bioinformatics and Computational Biology's (ASBCB) biennial meeting. This symposium was organised by students within the society inside Africa and was held on the 10 th of October 2017 in Entebbe, Uganda.

16.
Genome Announc ; 5(30)2017 Jul 27.
Artigo em Inglês | MEDLINE | ID: mdl-28751397

RESUMO

Ophidiomyces ophiodiicola, which belongs to the order Onygenales, is an emerging fungal pathogen of snakes in the United States. This study reports the 21.9-Mb genome sequence of an isolate of this reptilian pathogen obtained from a black racer snake in Pennsylvania.

17.
PeerJ ; 4: e2359, 2016.
Artigo em Inglês | MEDLINE | ID: mdl-27635331

RESUMO

We present the phylogeny analysis software SICLE (Sister Clade Extractor), an easy-to-use, high-throughput tool to describe the nearest neighbors to a node of interest in a phylogenetic tree as well as the support value for the relationship. The application is a command line utility that can be embedded into a phylogenetic analysis pipeline or can be used as a subroutine within another C++ program. As a test case, we applied this new tool to the published phylome of Salinibacter ruber, a species of halophilic Bacteriodetes, identifying 13 unique sister relationships to S. ruber across the 4,589 gene phylogenies. S. ruber grouped with bacteria, most often other Bacteriodetes, in the majority of phylogenies, but 91 phylogenies showed a branch-supported sister association between S. ruber and Archaea, an evolutionarily intriguing relationship indicative of horizontal gene transfer. This test case demonstrates how SICLE makes it possible to summarize the phylogenetic information produced by automated phylogenetic pipelines to rapidly identify and quantify the possible evolutionary relationships that merit further investigation. SICLE is available for free for noncommercial use at http://eebweb.arizona.edu/sicle/.

18.
J Comput Biol ; 20(4): 259-79, 2013 Apr.
Artigo em Inglês | MEDLINE | ID: mdl-23489379

RESUMO

Abstract We develop a novel and general approach to estimating the accuracy of multiple sequence alignments without knowledge of a reference alignment, and use our approach to address a new task that we call parameter advising: the problem of choosing values for alignment scoring function parameters from a given set of choices to maximize the accuracy of a computed alignment. For protein alignments, we consider twelve independent features that contribute to a quality alignment. An accuracy estimator is learned that is a polynomial function of these features; its coefficients are determined by minimizing its error with respect to true accuracy using mathematical optimization. Compared to prior approaches for estimating accuracy, our new approach (a) introduces novel feature functions that measure nonlocal properties of an alignment yet are fast to evaluate, (b) considers more general classes of estimators beyond linear combinations of features, and (c) develops new regression formulations for learning an estimator from examples; in addition, for parameter advising, we (d) determine the optimal parameter set of a given cardinality, which specifies the best parameter values from which to choose. Our estimator, which we call Facet (for "feature-based accuracy estimator"), yields a parameter advisor that on the hardest benchmarks provides more than a 27% improvement in accuracy over the best default parameter choice, and for parameter advising significantly outperforms the best prior approaches to assessing alignment quality.


Assuntos
Proteínas/química , Alinhamento de Sequência/métodos , Sequência de Aminoácidos , Bases de Dados de Proteínas , Estrutura Secundária de Proteína
19.
FEBS Lett ; 585(15): 2467-76, 2011 Aug 04.
Artigo em Inglês | MEDLINE | ID: mdl-21723283

RESUMO

To identify epigenetically regulated miRNAs in melanoma, we treated a stage 3 melanoma cell line WM1552C, with 5AzadC and/or 4-PBA. Several hypermethylated miRNAs were detected, one of which, miR-375, was highly methylated and was studied further. Minimal CpG island methylation was observed in melanocytes, keratinocytes, normal skin, and nevus but hypermethylation was observed in patient tissue samples from primary, regional, distant, and nodular metastatic melanoma. Ectopic expression of miR-375 inhibited melanoma cell proliferation, invasion, and cell motility, and induced cell shape changes, strongly suggesting that miR-375 may have an important function in the development and progression of human melanomas.


Assuntos
Epigênese Genética/fisiologia , Melanoma/patologia , MicroRNAs/fisiologia , Movimento Celular , Proliferação de Células , Forma Celular , Metilação de DNA , Humanos , Melanoma/genética , MicroRNAs/análise , Invasividade Neoplásica , Células Tumorais Cultivadas
20.
PLoS One ; 6(9): e24922, 2011.
Artigo em Inglês | MEDLINE | ID: mdl-21949788

RESUMO

Invasive melanoma is the most lethal form of skin cancer. The treatment of melanoma-derived cell lines with 5-aza-2'-deoxycytidine (5-Aza-dC) markedly increases the expression of several miRNAs, suggesting that the miRNA-encoding genes might be epigenetically regulated, either directly or indirectly, by DNA methylation. We have identified a group of epigenetically regulated miRNA genes in melanoma cells, and have confirmed that the upstream CpG island sequences of several such miRNA genes are hypermethylated in cell lines derived from different stages of melanoma, but not in melanocytes and keratinocytes. We used direct DNA bisulfite and immunoprecipitated DNA (Methyl-DIP) to identify changes in CpG island methylation in distinct melanoma patient samples classified as primary in situ, regional metastatic, and distant metastatic. Two melanoma cell lines (WM1552C and A375 derived from stage 3 and stage 4 human melanoma, respectively) were engineered to ectopically express one of the epigenetically modified miRNA: miR-34b. Expression of miR-34b reduced cell invasion and motility rates of both WM1552C and A375, suggesting that the enhanced cell invasiveness and motility observed in metastatic melanoma cells may be related to their reduced expression of miR-34b. Total RNA isolated from control or miR-34b-expressing WM1552C cells was subjected to deep sequencing to identify gene networks around miR-34b. We identified network modules that are potentially regulated by miR-34b, and which suggest a mechanism for the role of miR-34b in regulating normal cell motility and cytokinesis.


Assuntos
Movimento Celular , Epigenômica , Regulação Neoplásica da Expressão Gênica , Melanoma/genética , Melanoma/secundário , MicroRNAs/genética , Biomarcadores Tumorais/genética , Biomarcadores Tumorais/metabolismo , Northern Blotting , Adesão Celular , Linhagem Celular Tumoral , Ilhas de CpG , Metilação de DNA , Perfilação da Expressão Gênica , Inativação Gênica , Humanos , Invasividade Neoplásica , Análise de Sequência com Séries de Oligonucleotídeos , Regiões Promotoras Genéticas/genética , RNA Mensageiro/genética , Reação em Cadeia da Polimerase em Tempo Real , Neoplasias Cutâneas/genética , Neoplasias Cutâneas/metabolismo , Neoplasias Cutâneas/patologia , Cicatrização
SELEÇÃO DE REFERÊNCIAS
Detalhe da pesquisa