Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 49
Filtrar
1.
Int J Cancer ; 155(5): 934-945, 2024 Sep 01.
Artigo em Inglês | MEDLINE | ID: mdl-38709956

RESUMO

We analyzed variations in the epidermal growth factor receptor (EGFR) gene and 5'-upstream region to identify potential molecular predictors of treatment response in primary epithelial ovarian cancer. Tumor tissues collected during debulking surgery from the prospective multicenter OVCAD study were investigated. Copy number variations in the human endogenous retrovirus sequence human endogenous retrovirus K9 (HERVK9) and EGFR Exons 7 and 9, as well as repeat length and loss of heterozygosity of polymorphic CA-SSR I and relative EGFR mRNA expression were determined quantitatively. At least one EGFR variation was observed in 94% of the patients. Among the 30 combinations of variations discovered, enhanced platinum sensitivity (n = 151) was found dominantly with HERVK9 haploidy and Exon 7 tetraploidy, overrepresented among patients with survival ≥120 months (24/29, p = .0212). EGFR overexpression (≥80 percentile) was significantly less likely in the responders (17% vs. 32%, p = .044). Multivariate Cox regression analysis, including age, FIGO stage, and grade, indicated that the patients' subgroup was prognostically significant for CA-SSR I repeat length <18 CA for both alleles (HR 0.276, 95% confidence interval 0.109-0.655, p = .001). Although EGFR variations occur in ovarian cancer, the mRNA levels remain low compared to other EGFR-mutated cancers. Notably, the inherited length of the CA-SSR I repeat, HERVK9 haploidy, and Exon 7 tetraploidy conferred three times higher odds ratio to survive for more than 10 years under therapy. This may add value in guiding therapies if determined during follow-up in circulating tumor cells or circulating tumor DNA and offers HERVK9 as a potential therapeutic target.


Assuntos
Cromossomos Humanos Par 7 , Variações do Número de Cópias de DNA , Receptores ErbB , Neoplasias Ovarianas , Humanos , Feminino , Receptores ErbB/genética , Neoplasias Ovarianas/genética , Neoplasias Ovarianas/mortalidade , Neoplasias Ovarianas/patologia , Neoplasias Ovarianas/tratamento farmacológico , Pessoa de Meia-Idade , Cromossomos Humanos Par 7/genética , Estudos Prospectivos , Idoso , Carcinoma Epitelial do Ovário/genética , Carcinoma Epitelial do Ovário/mortalidade , Carcinoma Epitelial do Ovário/patologia , Adulto , Retroelementos/genética , Fenótipo , Resistencia a Medicamentos Antineoplásicos/genética , Retrovirus Endógenos/genética , Perda de Heterozigosidade
2.
Nucleic Acids Res ; 47(1): 341-361, 2019 01 10.
Artigo em Inglês | MEDLINE | ID: mdl-30357366

RESUMO

The RNA-binding protein TDP-43 is heavily implicated in neurodegenerative disease. Numerous patient mutations in TARDBP, the gene encoding TDP-43, combined with data from animal and cell-based models, imply that altered RNA regulation by TDP-43 causes Amyotrophic Lateral Sclerosis and Frontotemporal Dementia. However, underlying mechanisms remain unresolved. Increased cytoplasmic TDP-43 levels in diseased neurons suggest a possible role in this cellular compartment. Here, we examined the impact on translation of overexpressing human TDP-43 and the TDP-43A315T patient mutant protein in motor neuron-like cells and primary cultures of cortical neurons. In motor-neuron like cells, TDP-43 associates with ribosomes without significantly affecting global translation. However, ribosome profiling and additional assays revealed enhanced translation and direct binding of Camta1, Mig12, and Dennd4a mRNAs. Overexpressing either wild-type TDP-43 or TDP-43A315T stimulated translation of Camta1 and Mig12 mRNAs via their 5'UTRs and increased CAMTA1 and MIG12 protein levels. In contrast, translational enhancement of Dennd4a mRNA required a specific 3'UTR region and was specifically observed with the TDP-43A315T patient mutant allele. Our data reveal that TDP-43 can function as an mRNA-specific translational enhancer. Moreover, since CAMTA1 and DENND4A are linked to neurodegeneration, they suggest that this function could contribute to disease.


Assuntos
Proteínas de Ligação ao Cálcio/genética , Proteínas de Ligação a DNA/genética , Doenças Neurodegenerativas/genética , Transativadores/genética , Esclerose Lateral Amiotrófica/genética , Esclerose Lateral Amiotrófica/patologia , Animais , Citoplasma/genética , Citoplasma/metabolismo , Demência Frontotemporal/genética , Demência Frontotemporal/patologia , Regulação da Expressão Gênica/genética , Humanos , Camundongos , Proteínas Associadas aos Microtúbulos/genética , Neurônios Motores/metabolismo , Neurônios Motores/patologia , Mutação , Doenças Neurodegenerativas/patologia , Cultura Primária de Células , RNA Mensageiro/genética , Ribossomos/genética
3.
Bioinformatics ; 35(16): 2853-2855, 2019 08 15.
Artigo em Inglês | MEDLINE | ID: mdl-30596893

RESUMO

SUMMARY: The graphical fragment assembly (GFA) formats are emerging standard formats for the representation of sequence graphs. Although GFA 1 was primarily targeting assembly graphs, the newer GFA 2 format introduces several features, which makes it suitable for representing other kinds of information, such as scaffolding graphs, variation graphs, alignment graphs and colored metagenomic graphs. Here, we present GfaViz, an interactive graphical tool for the visualization of sequence graphs in GFA format. The software supports all new features of GFA 2 and introduces conventions for their visualization. The user can choose between two different layouts and multiple styles for representing single elements or groups. All customizations can be stored in custom tags of the GFA format itself, without requiring external configuration files. Stylesheets are supported for storing standard configuration options for groups of files. The visualizations can be exported to raster and vector graphics formats. A command line interface allows for batch generation of images. AVAILABILITY AND IMPLEMENTATION: GfaViz is available at https://github.com/ggonnella/gfaviz. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Assuntos
Software , Metagenoma , Análise de Sequência
4.
Bioinformatics ; 33(19): 3094-3095, 2017 Oct 01.
Artigo em Inglês | MEDLINE | ID: mdl-28645150

RESUMO

SUMMARY: GFA 1 and GFA 2 are recently defined formats for representing sequence graphs, such as assembly, variation or splicing graphs. The formats are adopted by several software tools. Here, we present GfaPy, a software package for creating, parsing and editing GFA graphs using the programming language Python. GfaPy supports GFA 1 and GFA 2, using the same interface and allows for interconversion between both formats. The software package provides a simple interface for custom record types, which is an important new feature of GFA 2 (compared to GFA 1). This enables new applications of the format. AVAILABILITY AND IMPLEMENTATION: GfaPy is available open source at https://github.com/ggonnella/gfapy and installable via pip. CONTACT: gonnella@zbh.uni-hamburg.de. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Assuntos
Análise de Sequência/métodos , Software , Gráficos por Computador , Linguagens de Programação
5.
Appl Environ Microbiol ; 80(15): 4585-98, 2014 Aug.
Artigo em Inglês | MEDLINE | ID: mdl-24837379

RESUMO

The active venting Sisters Peak (SP) chimney on the Mid-Atlantic Ridge holds the current temperature record for the hottest ever measured hydrothermal fluids (400°C, accompanied by sudden temperature bursts reaching 464°C). Given the unprecedented temperature regime, we investigated the biome of this chimney with a focus on special microbial adaptations for thermal tolerance. The SP metagenome reveals considerable differences in the taxonomic composition from those of other hydrothermal vent and subsurface samples; these could be better explained by temperature than by other available abiotic parameters. The most common species to which SP genes were assigned were thermophilic Aciduliprofundum sp. strain MAR08-339 (11.8%), Hippea maritima (3.8%), Caldisericum exile (1.5%), and Caminibacter mediatlanticus (1.4%) as well as to the mesophilic Niastella koreensis (2.8%). A statistical analysis of associations between taxonomic and functional gene assignments revealed specific overrepresented functional categories: for Aciduliprofundum, protein biosynthesis, nucleotide metabolism, and energy metabolism genes; for Hippea and Caminibacter, cell motility and/or DNA replication and repair system genes; and for Niastella, cell wall and membrane biogenesis genes. Cultured representatives of these organisms inhabit different thermal niches; i.e., Aciduliprofundum has an optimal growth temperature of 70°C, Hippea and Caminibacter have optimal growth temperatures around 55°C, and Niastella grows between 10 and 37°C. Therefore, we posit that the different enrichment profiles of functional categories reflect distinct microbial strategies to deal with the different impacts of the local sudden temperature bursts in disparate regions of the chimney.


Assuntos
Bactérias/isolamento & purificação , Água do Mar/microbiologia , Bactérias/classificação , Bactérias/genética , Bactérias/crescimento & desenvolvimento , Temperatura Alta , Dados de Sequência Molecular , Filogenia , Água do Mar/química
6.
J Pathol ; 231(1): 130-41, 2013 Sep.
Artigo em Inglês | MEDLINE | ID: mdl-23794398

RESUMO

Deletion of 3p13 has been reported from about 20% of prostate cancers. The clinical significance of this alteration and the tumour suppressor gene(s) driving the deletion remain to be identified. We have mapped the 3p13 deletion locus using SNP array analysis and performed fluorescence in situ hybridization (FISH) analysis to search for associations between 3p13 deletion, prostate cancer phenotype and patient prognosis in a tissue microarray containing more than 3200 prostate cancers. SNP array analysis of 72 prostate cancers revealed a small deletion at 3p13 in 14 (19%) of the tumours, including the putative tumour suppressors FOXP1, RYBP and SHQ1. FISH analysis using FOXP1-specific probes revealed deletions in 16.5% and translocations in 1.2% of 1828 interpretable cancers. 3p13 deletions were linked to adverse features of prostate cancer, including advanced stage (p < 0.0001), high Gleason grade (p = 0.0125), and early PSA recurrence (p = 0.0015). In addition, 3p13 deletions were linked to ERG(+) cancers and to PTEN deletions (p < 0.0001 each). A subset analysis of ERG(+) tumours revealed that 3p13 deletions occurred independently from PTEN deletions (p = 0.3126), identifying tumours with 3p13 deletion as a distinct molecular subset of ERG(+) cancers. mRNA expression analysis confirmed that all 3p13 genes were down regulated by the deletion. Ectopic over-expression of FOXP1, RYBP and SHQ1 resulted in decreased colony-formation capabilities, corroborating a tumour suppressor function for all three genes. In summary, our data show that deletion of 3p13 defines a distinct and aggressive molecular subset of ERG(+) prostate cancers, which is possibly driven by inactivation of multiple tumour suppressors.


Assuntos
Adenocarcinoma/genética , Deleção Cromossômica , Cromossomos Humanos Par 3/genética , Genes Supressores de Tumor , Neoplasias da Próstata/genética , Adenocarcinoma/metabolismo , Adenocarcinoma/mortalidade , Adenocarcinoma/patologia , Linhagem Celular Tumoral , Fatores de Transcrição Forkhead/genética , Fatores de Transcrição Forkhead/metabolismo , Perfilação da Expressão Gênica , Técnicas de Silenciamento de Genes , Alemanha/epidemiologia , Humanos , Estimativa de Kaplan-Meier , Masculino , Recidiva Local de Neoplasia , Análise de Sequência com Séries de Oligonucleotídeos , Proteínas de Fusão Oncogênica/metabolismo , Polimorfismo de Nucleotídeo Único , Próstata/metabolismo , Próstata/patologia , Prostatectomia , Neoplasias da Próstata/metabolismo , Neoplasias da Próstata/mortalidade , Neoplasias da Próstata/patologia , Proteínas Repressoras/genética , Proteínas Repressoras/metabolismo , Análise Serial de Tecidos
7.
BMC Bioinformatics ; 14: 226, 2013 Jul 17.
Artigo em Inglês | MEDLINE | ID: mdl-23865810

RESUMO

BACKGROUND: It is well known that the search for homologous RNAs is more effective if both sequence and structure information is incorporated into the search. However, current tools for searching with RNA sequence-structure patterns cannot fully handle mutations occurring on both these levels or are simply not fast enough for searching large sequence databases because of the high computational costs of the underlying sequence-structure alignment problem. RESULTS: We present new fast index-based and online algorithms for approximate matching of RNA sequence-structure patterns supporting a full set of edit operations on single bases and base pairs. Our methods efficiently compute semi-global alignments of structural RNA patterns and substrings of the target sequence whose costs satisfy a user-defined sequence-structure edit distance threshold. For this purpose, we introduce a new computing scheme to optimally reuse the entries of the required dynamic programming matrices for all substrings and combine it with a technique for avoiding the alignment computation of non-matching substrings. Our new index-based methods exploit suffix arrays preprocessed from the target database and achieve running times that are sublinear in the size of the searched sequences. To support the description of RNA molecules that fold into complex secondary structures with multiple ordered sequence-structure patterns, we use fast algorithms for the local or global chaining of approximate sequence-structure pattern matches. The chaining step removes spurious matches from the set of intermediate results, in particular of patterns with little specificity. In benchmark experiments on the Rfam database, our improved online algorithm is faster than the best previous method by up to factor 45. Our best new index-based algorithm achieves a speedup of factor 560. CONCLUSIONS: The presented methods achieve considerable speedups compared to the best previous method. This, together with the expected sublinear running time of the presented index-based algorithms, allows for the first time approximate matching of RNA sequence-structure patterns in large sequence databases. Beyond the algorithmic contributions, we provide with RaligNAtor a robust and well documented open-source software package implementing the algorithms presented in this manuscript. The RaligNAtor software is available at http://www.zbh.uni-hamburg.de/ralignator.


Assuntos
Algoritmos , Sequência de Bases , Análise de Sequência de RNA , Pareamento de Bases , Sequência de Bases/genética , Biologia Computacional/métodos , Simulação por Computador , Bases de Dados Factuais , RNA/química , RNA/genética , Alinhamento de Sequência , Análise de Sequência de RNA/métodos , Software
8.
Environ Microbiol ; 15(5): 1551-60, 2013 May.
Artigo em Inglês | MEDLINE | ID: mdl-23171403

RESUMO

We present data on the co-registered geochemistry (in situ mass spectrometry) and microbiology (pyrosequencing of 16S rRNA genes; V1, V2, V3 regions) in five fluid samples from Irina II in the Logatchev hydrothermal field. Two samples were collected over 24 min from the same spot and further three samples were from spatially distinct locations (20 cm, 3 m and the overlaying plume). Four low-temperature hydrothermal fluids from the Irina II are composed of the same core bacterial community, namely specific Gammaproteobacteria and Epsilonproteobacteria, which, however, differs in the relative abundance. The microbial composition of the fifth sample (plume) is considerably different. Although a significant correlation between sulfide enrichment and proportions of Sulfurovum (Epsilonproteobacteria) was found, no other significant linkages between abiotic factors, i.e. temperature, hydrogen, methane, sulfide and oxygen, and bacterial lineages were evident. Intriguingly, bacterial community compositions of some time series samples from the same spot were significantly more similar to a sample collected 20 cm away than to each other. Although this finding is based on three single samples only, it provides first hints that single hydrothermal fluid samples collected on a small spatial scale may also reflect unrecognized temporal variability. However, further studies are required to support this hypothesis.


Assuntos
Biodiversidade , Fontes Hidrotermais/química , Fontes Hidrotermais/microbiologia , Água do Mar/química , Água do Mar/microbiologia , Concentração de Íons de Hidrogênio , Magnésio/análise , Oxigênio/análise , Proteobactérias/genética , Proteobactérias/isolamento & purificação , RNA Ribossômico 16S/genética , Temperatura , Fatores de Tempo
9.
Am J Pathol ; 181(2): 401-12, 2012 Aug.
Artigo em Inglês | MEDLINE | ID: mdl-22705054

RESUMO

The phosphatase and tensin homolog deleted on chromosome 10 (PTEN) gene is often altered in prostate cancer. To determine the prevalence and clinical significance of the different mechanisms of PTEN inactivation, we analyzed PTEN deletions in TMAs containing 4699 hormone-naïve and 57 hormone-refractory prostate cancers using fluorescence in situ hybridization analysis. PTEN mutations and methylation were analyzed in subsets of 149 and 34 tumors, respectively. PTEN deletions were present in 20.2% (458/2266) of prostate cancers, including 8.1% heterozygous and 12.1% homozygous deletions, and were linked to advanced tumor stage (P < 0.0001), high Gleason grade (P < 0.0001), presence of lymph node metastasis (P = 0.0002), hormone-refractory disease (P < 0.0001), presence of ERG gene fusion (P < 0.0001), and nuclear p53 accumulation (P < 0.0001). PTEN deletions were also associated with early prostate-specific antigen recurrence in univariate (P < 0.0001) and multivariate (P = 0.0158) analyses. The prognostic impact of PTEN deletion was seen in both ERG fusion-positive and ERG fusion-negative tumors. PTEN mutations were found in 4 (12.9%) of 31 cancers with heterozygous PTEN deletions but in only 1 (2%) of 59 cancers without PTEN deletion (P = 0.027). Aberrant PTEN promoter methylation was not detected in 34 tumors. The results of this study demonstrate that biallelic PTEN inactivation, by either homozygous deletion or deletion of one allele and mutation of the other, occurs in most PTEN-defective cancers and characterizes a particularly aggressive subset of metastatic and hormone-refractory prostate cancers.


Assuntos
Deleção de Genes , Proteínas de Fusão Oncogênica/metabolismo , PTEN Fosfo-Hidrolase/genética , Antígeno Prostático Específico/metabolismo , Neoplasias da Próstata/enzimologia , Neoplasias da Próstata/patologia , Transativadores/metabolismo , Idoso , Biomarcadores Tumorais/metabolismo , Cromossomos Humanos Par 10/genética , Metilação de DNA/genética , Análise Mutacional de DNA , Progressão da Doença , Epigênese Genética , Genoma Humano/genética , Humanos , Imuno-Histoquímica , Masculino , Pessoa de Meia-Idade , Análise Multivariada , PTEN Fosfo-Hidrolase/metabolismo , Fenótipo , Polimorfismo de Nucleotídeo Único/genética , Regiões Promotoras Genéticas/genética , Modelos de Riscos Proporcionais , Recidiva , Regulador Transcricional ERG , Proteína Supressora de Tumor p53/metabolismo
10.
BMC Bioinformatics ; 13: 82, 2012 May 06.
Artigo em Inglês | MEDLINE | ID: mdl-22559072

RESUMO

BACKGROUND: Ongoing improvements in throughput of the next-generation sequencing technologies challenge the current generation of de novo sequence assemblers. Most recent sequence assemblers are based on the construction of a de Bruijn graph. An alternative framework of growing interest is the assembly string graph, not necessitating a division of the reads into k-mers, but requiring fast algorithms for the computation of suffix-prefix matches among all pairs of reads. RESULTS: Here we present efficient methods for the construction of a string graph from a set of sequencing reads. Our approach employs suffix sorting and scanning methods to compute suffix-prefix matches. Transitive edges are recognized and eliminated early in the process and the graph is efficiently constructed including irreducible edges only. CONCLUSIONS: Our suffix-prefix match determination and string graph construction algorithms have been implemented in the software package Readjoiner. Comparison with existing string graph-based assemblers shows that Readjoiner is faster and more space efficient. Readjoiner is available at http://www.zbh.uni-hamburg.de/readjoiner.


Assuntos
Software , Algoritmos , Simulação por Computador , Genoma Humano/genética , Humanos , Modelos Genéticos , Análise de Sequência de DNA/métodos
11.
BMC Bioinformatics ; 12: 214, 2011 May 27.
Artigo em Inglês | MEDLINE | ID: mdl-21619640

RESUMO

BACKGROUND: The secondary structure of RNA molecules is intimately related to their function and often more conserved than the sequence. Hence, the important task of searching databases for RNAs requires to match sequence-structure patterns. Unfortunately, current tools for this task have, in the best case, a running time that is only linear in the size of sequence databases. Furthermore, established index data structures for fast sequence matching, like suffix trees or arrays, cannot benefit from the complementarity constraints introduced by the secondary structure of RNAs. RESULTS: We present a novel method and readily applicable software for time efficient matching of RNA sequence-structure patterns in sequence databases. Our approach is based on affix arrays, a recently introduced index data structure, preprocessed from the target database. Affix arrays support bidirectional pattern search, which is required for efficiently handling the structural constraints of the pattern. Structural patterns like stem-loops can be matched inside out, such that the loop region is matched first and then the pairing bases on the boundaries are matched consecutively. This allows to exploit base pairing information for search space reduction and leads to an expected running time that is sublinear in the size of the sequence database. The incorporation of a new chaining approach in the search of RNA sequence-structure patterns enables the description of molecules folding into complex secondary structures with multiple ordered patterns. The chaining approach removes spurious matches from the set of intermediate results, in particular of patterns with little specificity. In benchmark experiments on the Rfam database, our method runs up to two orders of magnitude faster than previous methods. CONCLUSIONS: The presented method's sublinear expected running time makes it well suited for RNA sequence-structure pattern matching in large sequence databases. RNA molecules containing several stem-loop substructures can be described by multiple sequence-structure patterns and their matches are efficiently handled by a novel chaining method. Beyond our algorithmic contributions, we provide with Structator a complete and robust open-source software solution for index-based search of RNA sequence-structure patterns. The Structator software is available at http://www.zbh.uni-hamburg.de/Structator.


Assuntos
Algoritmos , RNA/química , Análise de Sequência de RNA/métodos , Software , Sequência de Bases , Conformação de Ácido Nucleico , RNA/genética
12.
Nucleic Acids Res ; 37(21): 7002-13, 2009 Nov.
Artigo em Inglês | MEDLINE | ID: mdl-19786494

RESUMO

Long terminal repeat (LTR) retrotransposons and endogenous retroviruses (ERVs) are transposable elements in eukaryotic genomes well suited for computational identification. De novo identification tools determine the position of potential LTR retrotransposon or ERV insertions in genomic sequences. For further analysis, it is desirable to obtain an annotation of the internal structure of such candidates. This article presents LTRdigest, a novel software tool for automated annotation of internal features of putative LTR retrotransposons. It uses local alignment and hidden Markov model-based algorithms to detect retrotransposon-associated protein domains as well as primer binding sites and polypurine tracts. As an example, we used LTRdigest results to identify 88 (near) full-length ERVs in the chromosome 4 sequence of Mus musculus, separating them from truncated insertions and other repeats. Furthermore, we propose a work flow for the use of LTRdigest in de novo LTR retrotransposon classification and perform an exemplary de novo analysis on the Drosophila melanogaster genome as a proof of concept. Using a new method solely based on the annotations generated by LTRdigest, 518 potential LTR retrotransposons were automatically assigned to 62 candidate groups. Representative sequences from 41 of these 62 groups were matched to reference sequences with >80% global sequence similarity.


Assuntos
Retroelementos , Software , Sequências Repetidas Terminais , Animais , Cromossomos de Mamíferos , Classificação/métodos , Drosophila melanogaster/genética , Retrovirus Endógenos/genética , Genoma de Inseto , Genômica , Camundongos
13.
Genes Chromosomes Cancer ; 49(1): 1-8, 2010 Jan.
Artigo em Inglês | MEDLINE | ID: mdl-19787783

RESUMO

Recently, amplification of PPFIA1, encoding a member of the liprin family located about 600 kb telomeric to CCND1 on chromosome band 11q13, was described in squamous cell carcinoma of head and neck. Because 11q13 amplification is frequent in breast cancer, and PPFIA1 has been suggested to contribute to mammary gland development, we hypothesized that PPFIA1 might also be involved in the 11q13 amplicon in breast cancer and contribute to breast cancer development. A tissue microarray containing more than 2000 human breast cancers was analyzed for gene copy numbers of PPFIA1 and CCND1 by means of fluorescence in situ hybridization. PPFIA1 amplification was found in 248/1583 (15.4%) of breast cancers. Coamplification with CCND1 was found in all (248/248, 100%) PPFIA1-amplified cancers. CCND1 amplification without PPFIA1 coamplification was found in additional 117 (4.7%) tumors. Amplification of both PPFIA1 and CCND1 were significantly associated with high-grade phenotype (P = 0.0002) but were unrelated to tumor stage (P = 0.7066) or nodal stage (P = 0.5807). No difference in patient prognosis was found between 248 CCND1/PPFIA1 coamplified tumors and 117 tumors with CCND1 amplification alone (P = 0.6419). These data show that PPFIA1 amplification occurs frequently in breast cancer. The higher incidence of CCND1 amplification when compared with PPFIA1, the lack of prognostic relevance of coamplifications, and the fact that PPFIA1 amplification was found exclusively in CCND1-amplified cancers suggest that PPFIA1 gene copy number changes represent concurrent events of CCND1 amplification rather than specific biological incidents.


Assuntos
Proteínas Adaptadoras de Transdução de Sinal/genética , Neoplasias da Mama/genética , Ciclina D1/genética , Amplificação de Genes , Adulto , Idoso , Idoso de 80 Anos ou mais , Neoplasias da Mama/patologia , Cromossomos Humanos Par 11 , Feminino , Dosagem de Genes , Humanos , Incidência , Pessoa de Meia-Idade , Fenótipo , Prognóstico , Análise Serial de Tecidos
14.
Algorithms Mol Biol ; 16(1): 20, 2021 Aug 23.
Artigo em Inglês | MEDLINE | ID: mdl-34425870

RESUMO

BACKGROUND: Repetitive elements contribute a large part of eukaryotic genomes. For example, about 40 to 50% of human, mouse and rat genomes are repetitive. So identifying and classifying repeats is an important step in genome annotation. This annotation step is traditionally performed using alignment based methods, either in a de novo approach or by aligning the genome sequence to a species specific set of repetitive sequences. Recently, Li (Bioinformatics 35:4408-4410, 2019) developed a novel software tool dna-brnn to annotate repetitive sequences using a recurrent neural network trained on sample annotations of repetitive elements. RESULTS: We have developed the methods of dna-brnn further and engineered a new software tool DeepGRP. This combines the basic concepts of Li (Bioinformatics 35:4408-4410, 2019) with current techniques developed for neural machine translation, the attention mechanism, for the task of nucleotide-level annotation of repetitive elements. An evaluation on the human genome shows a 20% improvement of the Matthews correlation coefficient for the predictions delivered by DeepGRP, when compared to dna-brnn. DeepGRP predicts two additional classes of repeats (compared to dna-brnn) and is able to transfer repeat annotations, using RepeatMasker-based training data to a different species (mouse). Additionally, we could show that DeepGRP predicts repeats annotated in the Dfam database, but not annotated by RepeatMasker. DeepGRP is highly scalable due to its implementation in the TensorFlow framework. For example, the GPU-accelerated version of DeepGRP is approx. 1.8 times faster than dna-brnn, approx. 8.6 times faster than RepeatMasker and over 100 times faster than HMMER searching for models of the Dfam database. CONCLUSIONS: By incorporating methods from neural machine translation, DeepGRP achieves a consistent improvement of the quality of the predictions compared to dna-brnn. Improved running times are obtained by employing TensorFlow as implementation framework and the use of GPUs. By incorporating two additional classes of repeats, DeepGRP provides more complete annotations, which were evaluated against three state-of-the-art tools for repeat annotation.

15.
BMC Genomics ; 11: 335, 2010 May 27.
Artigo em Inglês | MEDLINE | ID: mdl-20507619

RESUMO

BACKGROUND: The Mongolian gerbils are a good model to mimic the Helicobacter pylori-associated pathogenesis of the human stomach. In the current study the gerbil-adapted strain B8 was completely sequenced, annotated and compared to previous genomes, including the 73 supercontigs of the parental strain B128. RESULTS: The complete genome of H. pylori B8 was manually curated gene by gene, to assign as much function as possible. It consists of a circular chromosome of 1,673,997 bp and of a small plasmid of 6,032 bp carrying nine putative genes. The chromosome contains 1,711 coding sequences, 293 of which are strain-specific, coding mainly for hypothetical proteins, and a large plasticity zone containing a putative type-IV-secretion system and coding sequences with unknown function. The cag-pathogenicity island is rearranged such that the cagA-gene is located 13,730 bp downstream of the inverted gene cluster cagB-cag1. Directly adjacent to the cagA-gene, there are four hypothetical genes and one variable gene with a different codon usage compared to the rest of the H. pylori B8-genome. This indicates that these coding sequences might be acquired via horizontal gene transfer.The genome comparison of strain B8 to its parental strain B128 delivers 425 unique B8-proteins. Due to the fact that strain B128 was not fully sequenced and only automatically annotated, only 12 of these proteins are definitive singletons that might have been acquired during the gerbil-adaptation process of strain B128. CONCLUSION: Our sequence data and its analysis provide new insight into the high genetic diversity of H. pylori-strains. We have shown that the gerbil-adapted strain B8 has the potential to build, possibly by a high rate of mutation and recombination, a dynamic pool of genetic variants (e.g. fragmented genes and repetitive regions) required for the adaptation-processes. We hypothesize that these variants are essential for the colonization and persistence of strain B8 in the gerbil stomach during in ammation.


Assuntos
Adaptação Fisiológica , Genômica/métodos , Gerbillinae/microbiologia , Helicobacter pylori/genética , Helicobacter pylori/fisiologia , Análise de Sequência de DNA/métodos , Animais , Antígenos de Bactérias/genética , Proteínas de Bactérias/genética , Códon/genética , Variação Genética , Genoma Bacteriano/genética , Humanos , Plasmídeos/genética , Proteoma/genética , Especificidade da Espécie , Estômago/microbiologia
16.
Bioinformatics ; 25(24): 3251-8, 2009 Dec 15.
Artigo em Inglês | MEDLINE | ID: mdl-19828575

RESUMO

MOTIVATION: Profile hidden Markov models (pHMMs) are currently the most popular modeling concept for protein families. They provide sensitive family descriptors, and sequence database searching with pHMMs has become a standard task in today's genome annotation pipelines. On the downside, searching with pHMMs is computationally expensive. RESULTS: We propose a new method for efficient protein family classification and for speeding up database searches with pHMMs as is necessary for large-scale analysis scenarios. We employ simpler models of protein families called position-specific scoring matrices family models (PSSM-FMs). For fast database search, we combine full-text indexing, efficient exact p-value computation of PSSM match scores and fast fragment chaining. The resulting method is well suited to prefilter the set of sequences to be searched for subsequent database searches with pHMMs. We achieved a classification performance only marginally inferior to hmmsearch, yet, results could be obtained in a fraction of runtime with a speedup of >64-fold. In experiments addressing the method's ability to prefilter the sequence space for subsequent database searches with pHMMs, our method reduces the number of sequences to be searched with hmmsearch to only 0.80% of all sequences. The filter is very fast and leads to a total speedup of factor 43 over the unfiltered search, while retaining >99.5% of the original results. In a lossless filter setup for hmmsearch on UniProtKB/Swiss-Prot, we observed a speedup of factor 92. AVAILABILITY: The presented algorithms are implemented in the program PoSSuMsearch2, available for download at http://bibiserv.techfak.uni-bielefeld.de/possumsearch2/. CONTACT: beckstette@zbh.uni-hamburg.de SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Assuntos
Biologia Computacional/métodos , Bases de Dados de Proteínas , Cadeias de Markov , Matrizes de Pontuação de Posição Específica , Proteínas/classificação , Alinhamento de Sequência/métodos , Análise de Sequência de Proteína/métodos , Algoritmos , Reconhecimento Automatizado de Padrão/métodos , Software
17.
Bioinformatics ; 25(4): 533-4, 2009 Feb 15.
Artigo em Inglês | MEDLINE | ID: mdl-19106120

RESUMO

SUMMARY: To analyse the vast amount of genome annotation data available today, a visual representation of genomic features in a given sequence range is required. We developed a C library which provides layout and drawing capabilities for annotation features. It supports several common input and output formats and can easily be integrated into custom C applications. To exemplify the use of AnnotationSketch in other languages, we provide bindings to the scripting languages Ruby, Python and Lua. AVAILABILITY: The software is available under an open-source license as part of GenomeTools (http://genometools.org/annotationsketch.html).


Assuntos
Genoma , Software , Gráficos por Computador , Bases de Dados Factuais , Perfilação da Expressão Gênica/métodos , Linguagens de Programação , Interface Usuário-Computador
18.
BMC Cancer ; 10: 78, 2010 Mar 03.
Artigo em Inglês | MEDLINE | ID: mdl-20199686

RESUMO

BACKGROUND: Increased transcription of oncogenes like the epidermal growth factor receptor (EGFR) is frequently caused by amplification of the whole gene or at least of regulatory sequences. Aim of this study was to pinpoint mechanistic parameters occurring during egfr copy number gains leading to a stable EGFR overexpression and high sensitivity to extracellular signalling. A deeper understanding of those marker events might improve early diagnosis of cancer in suspect lesions, early detection of cancer progression and the prediction of egfr targeted therapies. METHODS: The basal-like/stemness type breast cancer cell line subpopulation MDA-MB-468 CD44high/CD24-/low, carrying high egfr amplifications, was chosen as a model system in this study. Subclones of the heterogeneous cell line expressing low and high EGF receptor densities were isolated by cell sorting. Genomic profiling was carried out for these by means of SNP array profiling, qPCR and FISH. Cell cycle analysis was performed using the BrdU quenching technique. RESULTS: Low and high EGFR expressing MDA-MB-468 CD44+/CD24-/low subpopulations separated by cell sorting showed intermediate and high copy numbers of egfr, respectively. However, during cell culture an increase solely for egfr gene copy numbers in the intermediate subpopulation occurred. This shift was based on the formation of new cells which regained egfr gene copies. By two parametric cell cycle analysis clonal effects mediated through growth advantage of cells bearing higher egfr gene copy numbers could most likely be excluded for being the driving force. Subsequently, the detection of a fragile site distal to the egfr gene, sustaining uncapped telomere-less chromosomal ends, the ladder-like structure of the intrachromosomal egfr amplification and a broader range of egfr copy numbers support the assumption that dynamic chromosomal rearrangements, like breakage-fusion-bridge-cycles other than proliferation drive the gain of egfr copies. CONCLUSION: Progressive genome modulation in the CD44+/CD24-/low subpopulation of the breast cancer cell line MDA-MB-468 leads to different coexisting subclones. In isolated low-copy cells asymmetric chromosomal segregation leads to new cells with regained solely egfr gene copies. Furthermore, egfr regain resulted in enhanced signal transduction of the MAP-kinase and PI3-kinase pathway. We show here for the first time a dynamic copy number regain in basal-like/stemness cell type breast cancer subpopulations which might explain genetic heterogeneity. Moreover, this process might also be involved in adaptive growth factor receptor intracellular signaling which support survival and migration during cancer development and progression.


Assuntos
Neoplasias da Mama/metabolismo , Antígeno CD24/biossíntese , Receptores ErbB/genética , Receptores de Hialuronatos/biossíntese , Ciclo Celular , Linhagem Celular Tumoral , Feminino , Citometria de Fluxo/métodos , Dosagem de Genes , Perfilação da Expressão Gênica , Variação Genética , Humanos , Cinética , Polimorfismo de Nucleotídeo Único , Transdução de Sinais
19.
PLoS Comput Biol ; 5(9): e1000502, 2009 Sep.
Artigo em Inglês | MEDLINE | ID: mdl-19750212

RESUMO

With few exceptions, current methods for short read mapping make use of simple seed heuristics to speed up the search. Most of the underlying matching models neglect the necessity to allow not only mismatches, but also insertions and deletions. Current evaluations indicate, however, that very different error models apply to the novel high-throughput sequencing methods. While the most frequent error-type in Illumina reads are mismatches, reads produced by 454's GS FLX predominantly contain insertions and deletions (indels). Even though 454 sequencers are able to produce longer reads, the method is frequently applied to small RNA (miRNA and siRNA) sequencing. Fast and accurate matching in particular of short reads with diverse errors is therefore a pressing practical problem. We introduce a matching model for short reads that can, besides mismatches, also cope with indels. It addresses different error models. For example, it can handle the problem of leading and trailing contaminations caused by primers and poly-A tails in transcriptomics or the length-dependent increase of error rates. In these contexts, it thus simplifies the tedious and error-prone trimming step. For efficient searches, our method utilizes index structures in the form of enhanced suffix arrays. In a comparison with current methods for short read mapping, the presented approach shows significantly increased performance not only for 454 reads, but also for Illumina reads. Our approach is implemented in the software segemehl available at http://www.bioinf.uni-leipzig.de/Software/segemehl/.


Assuntos
Biologia Computacional/métodos , Análise Mutacional de DNA/métodos , Mutação , Algoritmos , Sequência de Bases , Alinhamento de Sequência
20.
Front Microbiol ; 10: 2296, 2019.
Artigo em Inglês | MEDLINE | ID: mdl-31649639

RESUMO

The microbial community composition and its functionality was assessed for hydrothermal fluids and volcanic ash sediments from Haungaroa and hydrothermal fluids from the Brothers volcano in the Kermadec island arc (New Zealand). The Haungaroa volcanic ash sediments were dominated by epsilonproteobacterial Sulfurovum sp. Ratios of electron donor consumption to CO2 fixation from respective sediment incubations indicated that sulfide oxidation appeared to fuel autotrophic CO2 fixation, coinciding with thermodynamic estimates predicting sulfide oxidation as the major energy source in the environment. Transcript analyses with the sulfide-supplemented sediment slurries demonstrated that Sulfurovum prevailed in the experiments as well. Hence, our sediment incubations appeared to simulate environmental conditions well suggesting that sulfide oxidation catalyzed by Sulfurovum members drive biomass synthesis in the volcanic ash sediments. For the Haungaroa fluids no inorganic electron donor and responsible microorganisms could be identified that clearly stimulated autotrophic CO2 fixation. In the Brothers hydrothermal fluids Sulfurimonas (49%) and Hydrogenovibrio/Thiomicrospira (15%) species prevailed. Respective fluid incubations exhibited highest autotrophic CO2 fixation if supplemented with iron(II) or hydrogen. Likewise catabolic energy calculations predicted primarily iron(II) but also hydrogen oxidation as major energy sources in the natural fluids. According to transcript analyses with material from the incubation experiments Thiomicrospira/Hydrogenovibrio species dominated, outcompeting Sulfurimonas. Given that experimental conditions likely only simulated environmental conditions that cause Thiomicrospira/Hydrogenovibrio but not Sulfurimonas to thrive, it remains unclear which environmental parameters determine Sulfurimonas' dominance in the Brothers natural hydrothermal fluids.

SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA