RESUMO
BACKGROUND: Next generation sequencing (NGS) has been a handy tool in clinical practice, mainly due to its efficiency and cost-effectiveness. It has been widely used in genetic diagnosis of several inherited diseases, and, in clinical oncology, it may enhance the discovery of new susceptibility genes and enable individualized care of cancer patients. In this context, we explored a pan-cancer panel in the investigation of germline variants in Brazilian patients presenting clinical criteria for hereditary cancer syndromes or familial history. METHODS: Seventy-one individuals diagnosed or with familial history of hereditary cancer syndromes were submitted to custom pan-cancer panel including 16 high and moderate penetrance genes previously associated with hereditary cancer syndromes (APC, BRCA1, BRCA2, CDH1, CDKN2A, CHEK2, MSH2, MSH6, MUTYH, PTEN, RB1, RET, TP53, VHL, XPA and XPC). All pathogenic variants were validated by Sanger sequencing. RESULTS: We identified a total of eight pathogenic variants among 12 of 71 individuals (16.9%). Among the mutation-positive subjects, 50% were diagnosed with breast cancer and had mutations in BRCA1, CDH1 and MUTYH. Notably, 33.3% were individuals diagnosed with polyposis or who had family cases and harbored pathogenic mutations in APC and MUTYH. The remaining individuals (16.7%) were gastric cancer patients with pathogenic variants in CDH1 and MSH2. Overall, 54 (76.05%) individuals presented at least one variant uncertain significance (VUS), totalizing 81 VUS. Of these, seven were predicted to have disease-causing potential. CONCLUSION: Overall, analysis of all these genes in NGS-panel allowed the identification not only of pathogenic variants related to hereditary cancer syndromes but also of some VUS that need further clinical and molecular investigations. The results obtained in this study had a significant impact on patients and their relatives since it allowed genetic counselling and personalized management decisions.
Assuntos
Predisposição Genética para Doença/genética , Mutação em Linhagem Germinativa/genética , Sequenciamento de Nucleotídeos em Larga Escala/métodos , Síndromes Neoplásicas Hereditárias/genética , Brasil , Feminino , Humanos , MasculinoRESUMO
BACKGROUND: Clustering methods are essential to partitioning biological samples being useful to minimize the information complexity in large datasets. Tools in this context usually generates data with greed algorithms that solves some Data Mining difficulties which can degrade biological relevant information during the clustering process. The lack of standardization of metrics and consistent bases also raises questions about the clustering efficiency of some methods. Benchmarks are needed to explore the full potential of clustering methods - in which alignment-free methods stand out - and the good choice of dataset makes it essentials. RESULTS: Here we present a new approach to Data Mining in large protein sequences datasets, the Rapid Alignment Free Tool for Sequences Similarity Search to Groups (RAFTS3G), a method to clustering aiming of losing less biological information in the processes of generation groups. The strategy developed in our algorithm is optimized to be more astringent which reflects increase in accuracy and sensitivity in the generation of clusters in a wide range of similarity. RAFTS3G is the better choice compared to three main methods when the user wants more reliable result even ignoring the ideal threshold to clustering. CONCLUSION: In general, RAFTS3G is able to group up to millions of biological sequences into large datasets, which is a remarkable option of efficiency in clustering. RAFTS3G compared to other "standard-gold" methods in the clustering of large biological data maintains the balance between the reduction of biological information redundancy and the creation of consistent groups. We bring the binary search concept applied to grouped sequences which shows maintaining sensitivity/accuracy relation and up to minimize the time of data generated with RAFTS3G process.
Assuntos
Proteínas/química , Software , Algoritmos , Análise por Conglomerados , Mineração de Dados , Bases de Dados de ProteínasRESUMO
Accurate reconstruction of ancestral states is a critical evolutionary analysis when studying ancient proteins and comparing biochemical properties between parental or extinct species and their extant relatives. It relies on multiple sequence alignment (MSA) which may introduce biases, and it remains unknown how MSA methodological approaches impact ancestral sequence reconstruction (ASR). Here, we investigate how MSA methodology modulates ASR using a simulation study of various evolutionary scenarios. We evaluate the accuracy of ancestral protein sequence reconstruction for simulated data and compare reconstruction outcomes using different alignment methods. Our results reveal biases introduced not only by aligner algorithms and assumptions, but also tree topology and the rate of insertions and deletions. Under many conditions we find no substantial differences between the MSAs. However, increasing the difficulty for the aligners can significantly impact ASR. The MAFFT consistency aligners and PRANK variants exhibit the best performance, whereas FSA displays limited performance. We also discover a bias towards reconstructed sequences longer than the true ancestors, deriving from a preference for inferring insertions, in almost all MSA methodological approaches. In addition, we find measures of MSA quality generally correlate highly with reconstruction accuracy. Thus, we show MSA methodological differences can affect the quality of reconstructions and propose MSA methods should be selected with care to accurately determine ancestral states with confidence.
Assuntos
Técnicas Genéticas , Alinhamento de SequênciaRESUMO
BACKGROUND: The development of large-scale technologies for quantitative transcriptomics has enabled comprehensive analysis of the gene expression profiles in complete genomes. RNA-Seq allows the measurement of gene expression levels in a manner far more precise and global than previous methods. Studies using this technology are altering our view about the extent and complexity of the eukaryotic transcriptomes. In this respect, multiple efforts have been done to determine and analyse the gene expression patterns of human cell types in different conditions, either in normal or pathological states. However, until recently, little has been reported about the evolutionary marks present in human protein-coding genes, particularly from the combined perspective of gene expression and protein evolution. RESULTS: We present a combined analysis of human protein-coding gene expression profiling and time-scale ancestry mapping, that places the genes in taxonomy clades and reveals eight evolutionary major steps ("hallmarks"), that include clusters of functionally coherent proteins. The human expressed genes are analysed using a RNA-Seq dataset of 116 samples from 32 tissues. The evolutionary analysis of the human proteins is performed combining the information from: (i) a database of orthologous proteins (OMA), (ii) the taxonomy mapping of genes to lineage clades (from NCBI Taxonomy) and (iii) the evolution time-scale mapping provided by TimeTree (Timescale of Life). The human protein-coding genes are also placed in a relational context based in the construction of a robust gene coexpression network, that reveals tighter links between age-related protein-coding genes and finds functionally coherent gene modules. CONCLUSIONS: Understanding the relational landscape of the human protein-coding genes is essential for interpreting the functional elements and modules of our active genome. Moreover, decoding the evolutionary history of the human genes can provide very valuable information to reveal or uncover their origin and function.
Assuntos
Evolução Molecular , Proteoma , Proteômica , Análise por Conglomerados , Biologia Computacional/métodos , Perfilação da Expressão Gênica , Regulação da Expressão Gênica , Redes Reguladoras de Genes , Genômica/métodos , Sequenciamento de Nucleotídeos em Larga Escala , Humanos , Anotação de Sequência Molecular , Fases de Leitura Aberta , Especificidade de Órgãos/genética , Proteômica/métodos , TranscriptomaRESUMO
The aberrant expression of microRNAs in known to play a crucial role in carcinogenesis. Here, we evaluated the miRNA expression profile of sigmoid colon cancer (SCC) compared to adjacent-to-tumor (ADJ) and sigmoid colon healthy (SCH) tissues obtained from colon biopsy extracted from Brazilian patients. Comparisons were performed between each group separately, considering as significant p-values < 0.05 and |Log2(Fold-Change)| > 2. We found 20 differentially expressed miRNAs (DEmiRNAs) in all comparisons, two of which were shared between SCC vs. ADJ and SCC vs. SCH. We used miRTarBase, and miRTargetLink to identify target-genes of the differentially expressed miRNAs, and DAVID and REACTOME databases for gene enrichment analysis. We also used TCGA and GTEx databases to build miRNA-gene regulatory networks and check for the reproducibility in our results. As findings, in addition to previously known miRNAs associated with colorectal cancer, we identified three potential novel biomarkers. We showed that the three types of colon tissue could be clearly distinguished using a panel composed by the 20 DEmiRNAs. Additionally, we found enriched pathways related to the carcinogenic process in which miRNA could be involved, indicating that adjacent-to-tumor tissues may be already altered and cannot be considered as healthy tissues. Overall, we expect that these findings may help in the search for biomarkers to prevent cancer progression or, at least, allow its early detection, however, more studies are needed to confirm our results.
RESUMO
The molecular mechanisms behind aneurysmal subarachnoid haemorrhage (aSAH) are still poorly understood. Expression patterns of miRNAs may help elucidate the post-transcriptional gene expression in aSAH. Here, we evaluate the global miRNAs expression profile (miRnome) of patients with aSAH to identify potential biomarkers. We collected 33 peripheral blood samples (27 patients with cerebral aneurysm, collected 7 to 10 days after the haemorrhage, when usually is the cerebral vasospasm risk peak, and six controls). Then, were performed small RNA sequencing using an Illumina Next Generation Sequencing (NGS) platform. Differential expression analysis identified eight differentially expressed miRNAs. Among them, three were identified being up-regulated, and five down-regulated. miR-486-5p was the most abundant expressed and is associated with poor neurological admission status. In silico miRNA gene target prediction showed 148 genes associated with at least two differentially expressed miRNAs. Among these, THBS1 and VEGFA, known to be related to thrombospondin and vascular endothelial growth factor. Moreover, MYC gene was found to be regulated by four miRNAs, suggesting an important role in aneurysmal subarachnoid haemorrhage. Additionally, 15 novel miRNAs were predicted being expressed only in aSAH, suggesting possible involvement in aneurysm pathogenesis. These findings may help the identification of novel biomarkers of clinical interest.
Assuntos
Perfilação da Expressão Gênica , Redes Reguladoras de Genes , Aneurisma Intracraniano/genética , MicroRNAs/genética , Hemorragia Subaracnóidea/genética , Feminino , Regulação da Expressão Gênica , Humanos , Aneurisma Intracraniano/patologia , Masculino , Pessoa de Meia-Idade , Hemorragia Subaracnóidea/patologiaRESUMO
The Pirarucu (Arapaima gigas) is one of the world's largest freshwater fishes and member of the superorder Osteoglossomorpha (bonytongues), one of the oldest lineages of ray-finned fishes. This species is an obligate air-breather found in the basin of the Amazon River with an attractive potential for aquaculture. Its phylogenetic position among bony fishes makes the Pirarucu a relevant subject for evolutionary studies of early teleost diversification. Here, we present, for the first time, a draft genome version of the A. gigas genome, providing useful information for further functional and evolutionary studies. The A. gigas genome was assembled with 103-Gb raw reads sequenced in an Illumina platform. The final draft genome assembly was â¼661 Mb, with a contig N50 equal to 51.23 kb and scaffold N50 of 668 kb. Repeat sequences accounted for 21.69% of the whole genome, and a total of 24,655 protein-coding genes were predicted from the genome assembly, with an average of nine exons per gene. Phylogenomic analysis based on 24 fish species supported the postulation that Osteoglossomorpha and Elopomorpha (eels, tarpons, and bonefishes) are sister groups, both forming a sister lineage with respect to Clupeocephala (remaining teleosts). Divergence time estimations suggested that Osteoglossomorpha and Elopomorpha lineages emerged independently in a period of â¼30 Myr in the Jurassic. The draft genome of A. gigas provides a valuable genetic resource for further investigations of evolutionary studies and may also offer a valuable data for economic applications.