RESUMO
BACKGROUND: Oral cavity squamous cell carcinoma (OCSCC) is the most common head and neck malignancy. Although the survival rate of patients with advanced-stage disease remains approximately 20% to 60%, when detected at an early stage, the survival rate approaches 80%, posing a pressing need for a well validated profiling method to assess patients who have a high risk of developing OCSCC. Tumor DNA detection in saliva may provide a robust biomarker platform that overcomes the limitations of current diagnostic tests. However, there is no routine saliva-based screening method for patients with OCSCC. METHODS: The authors designed a custom next-generation sequencing panel with unique molecular identifiers that covers coding regions of 7 frequently mutated genes in OCSCC and applied it on DNA extracted from 121 treatment-naive OCSCC tumors and matched preoperative saliva specimens. RESULTS: By using stringent variant-calling criteria, mutations were detected in 106 tumors, consistent with a predicted detection rate ≥88%. Moreover, mutations identified in primary malignancies were also detected in 93% of saliva samples. To ensure that variants are not errors resulting in false-positive calls, a multistep analytical validation of this approach was performed: 1) re-sequencing of 46 saliva samples confirmed 88% of somatic variants; 2) no functionally relevant mutations were detected in saliva samples from 11 healthy individuals without a history of tobacco or alcohol; and 3) using a panel of 7 synthetic loci across 8 sequencing runs, it was confirmed that the platform developed is reproducible and provides sensitivity on par with droplet digital polymerase chain reaction. CONCLUSIONS: The current data highlight the feasibility of somatic mutation identification in driver genes in saliva collected at the time of OCSCC diagnosis.
Assuntos
Carcinoma de Células Escamosas , DNA de Neoplasias , Neoplasias Bucais , Saliva , Biomarcadores Tumorais , Carcinoma de Células Escamosas/diagnóstico , Carcinoma de Células Escamosas/genética , DNA de Neoplasias/genética , DNA de Neoplasias/isolamento & purificação , Humanos , Neoplasias Bucais/diagnóstico , Neoplasias Bucais/genética , MutaçãoRESUMO
BACKGROUND: Multi-gene panel sequencing using next-generation sequencing (NGS) methods is a key tool for genomic medicine. However, with an estimated 140 000 genomic tests available, current system inefficiencies result in high genetic-testing costs. Reduced testing costs are needed to expand the availability of genomic medicine. One solution to improve efficiency and lower costs is to calculate the most cost-effective set of panels for a typical pattern of test requests. METHODS: We compiled rare diseases, associated genes, point prevalence, and test-order frequencies from a representative laboratory. We then modeled the costs of the relevant steps in the NGS process in detail. Using a simulated annealing-based optimization procedure, we determined panel sets that were more cost-optimal than whole exome sequencing (WES) or clinical exome sequencing (CES). Finally, we repeated this methodology to cost-optimize pharmacogenomics (PGx) testing. RESULTS: For rare disease testing, we show that an optimal choice of 4-6 panels, uniquely covering genes that comprise 95% of the total prevalence of monogenic diseases, saves $257-304 per sample compared with WES, and $66-135 per sample compared with CES. For PGx, we show that the optimal multipanel solution saves $6-7 (27%-40%) over a single panel covering all relevant gene-drug associations. CONCLUSIONS: Laboratories can reduce costs using the proposed method to obtain and run a cost-optimal set of panels for specific test requests. In addition, payers can use this method to inform reimbursement policy.
Assuntos
Farmacogenética , Doenças Raras , Testes Genéticos/métodos , Sequenciamento de Nucleotídeos em Larga Escala/métodos , Humanos , Doenças Raras/genética , Sequenciamento do ExomaRESUMO
PURPOSE: Breast and/or ovarian cancers are among the most common cancers in women across the world. In the Indian population, the healthcare burden of breast and/or ovarian cancers has been steadily rising, thus stressing the need for early detection, surveillance, and disease management measures. However, the burden attributable to inherited mutations is not well characterized. METHODS: We sequenced 1010 unrelated patients and families from across India with an indication of breast and/or ovarian cancers, using the TruSight Cancer panel which includes 14 genes, strongly associated with risk of hereditary breast and/or ovarian cancers. Genetic variations were identified using the StrandNGS software and interpreted using the StrandOmics platform. RESULTS: We were able to detect mutations in 304 (30.1%) cases, of which, 56 mutations were novel. A majority (84.9%) of the mutations were detected in the BRCA1/2 genes as compared to non-BRCA genes (15.1%). When the cases were stratified on the basis of age at diagnosis and family history of cancer, the high rate of 75% of detection of hereditary variants was observed in patients whose age at diagnosis was below 40 years and had first-degree family member(s) affected by breast and/or ovarian cancers. Our findings indicate that in the Indian population, there is a high prevalence of mutations in the high-risk breast cancer genes: BRCA1, BRCA2, TP53, and PALB2. CONCLUSION: In India, socioeconomic inequality limiting access to treatment is a major factor towards increased cancer burden; therefore, incorporation of a cost-effective and comprehensive multi-gene test will be helpful in ensuring widespread implementation of genetic screening in the clinical practice for hereditary breast and/or ovarian cancers.
Assuntos
Proteína BRCA1/genética , Proteína BRCA2/genética , Neoplasias da Mama/genética , Proteína do Grupo de Complementação N da Anemia de Fanconi/genética , Proteína Supressora de Tumor p53/genética , Adulto , Idoso , Mama/patologia , Neoplasias da Mama/diagnóstico , Neoplasias da Mama/epidemiologia , Neoplasias da Mama/patologia , Detecção Precoce de Câncer , Feminino , Predisposição Genética para Doença , Mutação em Linhagem Germinativa , Humanos , Índia/epidemiologia , Programas de Rastreamento , Pessoa de Meia-Idade , Neoplasias Ovarianas/diagnóstico , Neoplasias Ovarianas/epidemiologia , Neoplasias Ovarianas/genética , Neoplasias Ovarianas/patologiaRESUMO
MicroRNAs are important negative regulators of protein-coding gene expression and have been studied intensively over the past years. Several measurement platforms have been developed to determine relative miRNA abundance in biological samples using different technologies such as small RNA sequencing, reverse transcription-quantitative PCR (RT-qPCR) and (microarray) hybridization. In this study, we systematically compared 12 commercially available platforms for analysis of microRNA expression. We measured an identical set of 20 standardized positive and negative control samples, including human universal reference RNA, human brain RNA and titrations thereof, human serum samples and synthetic spikes from microRNA family members with varying homology. We developed robust quality metrics to objectively assess platform performance in terms of reproducibility, sensitivity, accuracy, specificity and concordance of differential expression. The results indicate that each method has its strengths and weaknesses, which help to guide informed selection of a quantitative microRNA gene expression platform for particular study goals.
Assuntos
MicroRNAs/genética , Controle de Qualidade , Reprodutibilidade dos TestesRESUMO
Breast and/or ovarian cancer (BOC) are among the most frequently diagnosed forms of hereditary cancers and leading cause of death in India. This emphasizes on the need for a cost-effective method for early detection of these cancers. We sequenced 141 unrelated patients and families with BOC using the TruSight Cancer panel, which includes 13 genes strongly associated with risk of inherited BOC. Multi-gene sequencing was done on the Illumina MiSeq platform. Genetic variations were identified using the Strand NGS software and interpreted using the StrandOmics platform. We were able to detect pathogenic mutations in 51 (36.2%) cases, out of which 19 were novel mutations. When we considered familial breast cancer cases only, the detection rate increased to 52%. When cases were stratified based on age of diagnosis into three categories, ⩽40 years, 40-50 years and >50 years, the detection rates were higher in the first two categories (44.4% and 53.4%, respectively) as compared with the third category, in which it was 26.9%. Our study suggests that next-generation sequencing-based multi-gene panels increase the sensitivity of mutation detection and help in identifying patients with a high risk of developing cancer as compared with sequential tests of individual genes.
Assuntos
Neoplasias da Mama/epidemiologia , Neoplasias da Mama/genética , Síndrome Hereditária de Câncer de Mama e Ovário/epidemiologia , Síndrome Hereditária de Câncer de Mama e Ovário/genética , Mutação , Neoplasias Ovarianas/epidemiologia , Neoplasias Ovarianas/genética , Adulto , Idade de Início , Idoso , Neoplasias da Mama/diagnóstico , Variações do Número de Cópias de DNA , Feminino , Deleção de Genes , Duplicação Gênica , Genes BRCA1 , Genes BRCA2 , Testes Genéticos/métodos , Síndrome Hereditária de Câncer de Mama e Ovário/diagnóstico , Sequenciamento de Nucleotídeos em Larga Escala , Humanos , Índia/epidemiologia , Pessoa de Meia-Idade , Taxa de Mutação , Neoplasias Ovarianas/diagnóstico , Prevalência , Adulto JovemRESUMO
PURPOSE: Retinoblastoma (Rb) is the most common primary intraocular cancer of childhood and one of the major causes of blindness in children. India has the highest number of patients with Rb in the world. Mutations in the RB1 gene are the primary cause of Rb, and heterogeneous mutations are distributed throughout the entire length of the gene. Therefore, genetic testing requires screening of the entire gene, which by conventional sequencing is time consuming and expensive. METHODS: In this study, we screened the RB1 gene in the DNA isolated from blood or saliva samples of 50 unrelated patients with Rb using the TruSight Cancer panel. Next-generation sequencing (NGS) was done on the Illumina MiSeq platform. Genetic variations were identified using the Strand NGS software and interpreted using the StrandOmics platform. RESULTS: We were able to detect germline pathogenic mutations in 66% (33/50) of the cases, 12 of which were novel. We were able to detect all types of mutations, including missense, nonsense, splice site, indel, and structural variants. When we considered bilateral Rb cases only, the mutation detection rate increased to 100% (22/22). In unilateral Rb cases, the mutation detection rate was 30% (6/20). CONCLUSIONS: Our study suggests that NGS-based approaches increase the sensitivity of mutation detection in the RB1 gene, making it fast and cost-effective compared to the conventional tests performed in a reflex-testing mode.
Assuntos
Sequenciamento de Nucleotídeos em Larga Escala , Mutação , Neoplasias da Retina/genética , Proteínas de Ligação a Retinoblastoma/genética , Retinoblastoma/genética , Ubiquitina-Proteína Ligases/genética , Adulto , Povo Asiático/genética , Criança , Pré-Escolar , Códon sem Sentido , Estudos de Coortes , Análise Mutacional de DNA , Éxons/genética , Feminino , Genes do Retinoblastoma , Testes Genéticos/métodos , Mutação em Linhagem Germinativa , Humanos , Índia , Lactente , Masculino , Pessoa de Meia-Idade , Reação em Cadeia da Polimerase , Adulto JovemRESUMO
We describe methods for rapid sequencing of the entire human mitochondrial genome (mtgenome), which involve long-range PCR for specific amplification of the mtgenome, pyrosequencing, quantitative mapping of sequence reads to identify sequence variants and heteroplasmy, as well as de novo sequence assembly. These methods have been used to study 40 publicly available HapMap samples of European (CEU) and African (YRI) ancestry to demonstrate a sequencing error rate <5.63×10(-4), nucleotide diversity of 1.6×10(-3) for CEU and 3.7×10(-3) for YRI, patterns of sequence variation consistent with earlier studies, but a higher rate of heteroplasmy varying between 10% and 50%. These results demonstrate that next-generation sequencing technologies allow interrogation of the mitochondrial genome in greater depth than previously possible which may be of value in biology and medicine.
Assuntos
DNA Mitocondrial/genética , Genoma Mitocondrial/genética , Genômica/métodos , Análise de Sequência de DNA/métodos , População Negra/genética , Bases de Dados Genéticas , Variação Genética , Projeto HapMap , Humanos , Reação em Cadeia da Polimerase , Alinhamento de Sequência , População Branca/genéticaRESUMO
Ultrasound-guided fine needle aspiration cytology (FNAC) is the preferred method of identifying malignancy in palpable thyroid nodules using the Bethesda reporting system. However, in around 30-40% of FNACs (Bethesda categories III, IV, and V), the results are indeterminate and surgery is required to confirm malignancy. Out of those who undergo surgery, only 10-40% of patients in these categories are found to have malignancies, thus proving surgery to be unnecessary for some patients or to be incomplete in others. While molecular testing on thyroid FNAC material is part of the American Thyroid Association (ATA) guidelines in evaluating thyroid nodules, it is currently unavailable in India due to cost constraints. In this study, we prospectively collected FNAC samples from sixty-nine patients who presented with palpable thyroid nodules. We designed a cost-effective next-generation sequencing (NGS) test to query multiple variants in the DNA and RNA isolated from the fine needle aspirate. The identification of oncogenic variants was considered to be indicative of malignancy, and confirmed by surgical histopathology. The panel showed an overall sensitivity of 81.25% and a specificity of 100%, while in the case of Bethesda categories III, IV, and V, the sensitivity was higher (87.5%) and the specificity was established at 100%. The panel could thereby serve as a rule-in test for the diagnosis of thyroid cancer and therefore help identify patients who require surgery, especially in the indeterminate Bethesda categories III, IV, and V.
RESUMO
BACKGROUND: Dried blood spots (DBS) are a relatively inexpensive source of nucleic acids and are easy to collect, transport, and store in large-scale field surveys, especially in resource-limited settings. However, their performance in whole-genome sequencing (WGS) relative to that of venous blood DNA has not been analyzed for various downstream applications. METHODS: This study compares the WGS performance of DBS paired with venous blood samples collected from 12 subjects. RESULTS: Results of standard quality checks of coverage, base quality, and mapping quality were found to be near identical between DBS and venous blood. Concordance for single-nucleotide variants, insertions and deletions, and copy number variants was high between these two sample types. Additionally, downstream analyses typical of population-based studies were performed, such as mitochondrial heteroplasmy detection, haplotype analysis, mitochondrial copy number changes, and determination of telomere lengths. The absolute mitochondrial copy number values were higher for DBS than for venous blood, though the trend in sample-to-sample variation was similar between DBS and blood. Telomere length estimates in most DBS samples were on par with those from venous blood. CONCLUSION: DBS samples can serve as a robust and feasible alternative to venous blood for studies requiring WGS analysis.
Assuntos
Sequenciamento Completo do GenomaRESUMO
BACKGROUND: Neurological disorders are clinically heterogeneous group of disorders and are major causes of disability and death. Several of these disorders are caused due to genetic aberration. A precise and confirmatory diagnosis in the patients in a timely manner is essential for appropriate therapeutic and management strategies. Due to the complexity of the clinical presentations across various neurological disorders, arriving at an accurate diagnosis remains a challenge. METHODS: We sequenced 1012 unrelated patients from India with suspected neurological disorders, using TruSight One panel. Genetic variations were identified using the Strand NGS software and interpreted using the StrandOmics platform. RESULTS: We were able to detect mutations in 197 genes in 405 (40%) cases and 178 mutations were novel. The highest diagnostic rate was observed among patients with muscular dystrophy (64%) followed by leukodystrophy and ataxia (43%, each). In our cohort, 26% of the patients who received definitive diagnosis were primarily referred with complex neurological phenotypes with no suggestive diagnosis. In terms of mutations types, 62.8% were truncating and in addition, 13.4% were structural variants, which are also likely to cause loss of function. CONCLUSION: In our study, we observed an improved performance of multi-gene panel testing, with an overall diagnostic yield of 40%. Furthermore, we show that NGS (next-generation sequencing)-based testing is comprehensive and can detect all types of variants including structural variants. It can be considered as a single-platform genetic test for neurological disorders that can provide a swift and definitive diagnosis in a cost-effective manner.
Assuntos
Análise de Dados , Predisposição Genética para Doença/genética , Testes Genéticos/métodos , Sequenciamento de Nucleotídeos em Larga Escala/métodos , Doenças do Sistema Nervoso/genética , Criança , Pré-Escolar , Estudos de Coortes , Feminino , Predisposição Genética para Doença/epidemiologia , Humanos , Índia/epidemiologia , Masculino , Herança Multifatorial/genética , Mutação/genética , Doenças do Sistema Nervoso/diagnóstico , Doenças do Sistema Nervoso/epidemiologiaRESUMO
Liquid biopsy is increasingly gaining traction as an alternative to invasive solid tumor biopsies for prognosis, treatment decisions, and disease monitoring. Matched tumor-plasma samples were collected from 180 patients across different cancers with >90% of the samples below Stage IIIB. Tumors were profiled using next-generation sequencing (NGS) or quantitative PCR (qPCR), and the mutation status was queried in the matched plasma using digital platforms such as droplet digital PCR (ddCPR) or NGS for concordance. Tumor-plasma concordance of 82% and 32% was observed in advanced (Stage IIB and above) and early (Stage I to Stage IIA) stage samples, respectively. Interestingly, the overall survival outcomes correlated to presurgical/at-biopsy ctDNA levels. Baseline ctDNA stratified patients into three categories: (a) high ctDNA correlated with poor survival outcome, (b) undetectable ctDNA with good outcome, and (c) low ctDNA whose outcome was ambiguous. ctDNA could be a powerful tool for therapy decisions and patient management in a large number of cancers across a variety of stages.
Assuntos
DNA Tumoral Circulante , Neoplasias/genética , Neoplasias/patologia , Adulto , Idoso , Idoso de 80 Anos ou mais , Feminino , Humanos , Estimativa de Kaplan-Meier , Biópsia Líquida , Masculino , Pessoa de Meia-Idade , Mutação , Prognóstico , Modelos de Riscos Proporcionais , Adulto JovemRESUMO
A large database of homologous sequence alignments with good estimates of evolutionary distances can be a valuable resource for molecular evolutionary studies and phylogenetic research in particular. We recently created a database containing 159,921 transcripts from human, mouse, rat, zebrafish and fugu species. Approximately 1,000 homology groups were identified with the help of Ensembl homology evidence. At the macro-level, the database allows us to answer queries of the form: 1. What is the average k-distance between 5' untranslated regions of human and mouse? 2. List the 10 groups with the highest K(a)/K(s) ratio between mouse and rat. 3. List all identical proteins between human and rat. Researchers interested in specific proteins can use a simple web interface to retrieve the homology groups of interest, examine all pairwise distances between members of the group and study the conservation of exon-intron gene structures using a graphical interface. The database is available at http://warta.bio.psu.edu/DED/.
Assuntos
Bases de Dados Genéticas , Evolução Molecular , Filogenia , Homologia de Sequência , Regiões 5' não Traduzidas , Animais , Humanos , Camundongos , Proteínas/classificação , Proteínas/genética , RNA Mensageiro/química , RNA Mensageiro/classificação , Ratos , Takifugu/genética , Interface Usuário-Computador , Peixe-Zebra/genéticaRESUMO
Comprehensive genetic profiling of tumors using next-generation sequencing (NGS) is gaining acceptance for guiding treatment decisions in cancer care. We designed a cancer profiling test combining both deep sequencing and immunohistochemistry (IHC) of relevant cancer targets to aid therapy choices in both standard-of-care (SOC) and advanced-stage treatments for solid tumors. The SOC report is provided in a short turnaround time for four tumors, namely lung, breast, colon, and melanoma, followed by an investigational report. For other tumor types, an investigational report is provided. The NGS assay reports single-nucleotide variants (SNVs), copy number variations (CNVs), and translocations in 152 cancer-related genes. The tissue-specific IHC tests include routine and less common markers associated with drugs used in SOC settings. We describe the standardization, validation, and clinical utility of the StrandAdvantage test (SA test) using more than 250 solid tumor formalin-fixed paraffin-embedded (FFPE) samples and control cell line samples. The NGS test showed high reproducibility and accuracy of >99%. The test provided relevant clinical information for SOC treatment as well as more information related to investigational options and clinical trials for >95% of advanced-stage patients. In conclusion, the SA test comprising a robust and accurate NGS assay combined with clinically relevant IHC tests can detect somatic changes of clinical significance for strategic cancer management in all the stages.
Assuntos
DNA de Neoplasias/genética , DNA de Neoplasias/metabolismo , Sequenciamento de Nucleotídeos em Larga Escala/métodos , Imuno-Histoquímica/métodos , Neoplasias/terapia , Análise de Sequência de DNA/métodos , Linhagem Celular Tumoral , Variações do Número de Cópias de DNA , Estudos de Associação Genética , Predisposição Genética para Doença , Humanos , Neoplasias/genética , Neoplasias/metabolismo , Polimorfismo de Nucleotídeo Único , Reprodutibilidade dos Testes , Padrão de Cuidado , Translocação GenéticaRESUMO
One of the most common activities in bioinformatics is the search for similar sequences. These searches are usually carried out with the help of programs from the NCBI BLAST family. As the majority of searches are routinely performed with default parameters, a question that should be addressed is how reliable the results obtained using the default parameter values are, i.e. what fraction of potential matches have been retrieved by these searches. Our primary focus is on the initial hit parameter, also known as the seed or word, used by the NCBI BLASTn, MegaBLAST and other similar programs in searches for similar nucleotide sequences. We show that the use of default values for the initial hit parameter can have a big negative impact on the proportion of potentially similar sequences that are retrieved. We also show how the hit probability of different seeds varies with the minimum length and similarity of sequences desired to be retrieved and describe methods that help in determining appropriate seeds. The experimental results described in this paper illustrate situations in which these methods are most applicable and also show the relationship between the various BLAST parameters.
Assuntos
Biologia Computacional/métodos , Sequência Conservada , Genômica/métodos , Nucleotídeos/genética , Software , Algoritmos , Sequência de Bases , Genoma , Probabilidade , Sensibilidade e Especificidade , Alinhamento de Sequência/métodosRESUMO
The chimpanzee is our closest living relative. The morphological differences between the two species are so large that there is no problem in distinguishing between them. However, the nucleotide difference between the two species is surprisingly small. The early genome comparison by DNA hybridization techniques suggested a nucleotide difference of 1-2%. Recently, direct nucleotide sequencing confirmed this estimate. These findings generated the common belief that the human is extremely close to the chimpanzee at the genetic level. However, if one looks at proteins, which are mainly responsible for phenotypic differences, the picture is quite different, and about 80% of proteins are different between the two species. Still, the number of proteins responsible for the phenotypic differences may be smaller since not all genes are directly responsible for phenotypic characters.
Assuntos
Proteínas/genética , Animais , Evolução Molecular , Humanos , Hibridização de Ácido Nucleico , Pan troglodytes , Especificidade da EspécieRESUMO
AIM: We release the Janssen Toxicogenomics database. This rat liver gene-expression database was generated using Codelink microarrays, and has been used over the past years within Janssen to derive signatures for multiple end points and to classify proprietary compounds. MATERIALS & METHODS: The release consists of gene-expression responses to 124 compounds, selected to give a broad coverage of liver-active compounds. A selection of the compounds were also analyzed on Affymetrix microarrays. RESULTS: The release includes results of an in-house reannotation pipeline to Entrez gene annotations, to classify probes into different confidence classes. High confidence unambiguously annotated probes were used to create gene-level data which served as starting point for cross-platform comparisons. Connectivity map-based similarity methods show excellent agreement between Codelink and Affymetrix runs of the same samples. We also compared our dataset with the Japanese Toxicogenomics Project and observed reasonable agreement, especially for compounds with stronger gene signatures. We describe an R-package containing the gene-level data and show how it can be used for expression-based similarity searches. CONCLUSION: Comparing the same biological samples run on the Affymetrix and the Codelink platform, good correspondence is observed using connectivity mapping approaches. As expected, this correspondence is smaller when the data are compared with an independent dataset such as TG-GATE. We hope that this collection of gene-expression profiles will be incorporated in toxicogenomics pipelines of users.
Assuntos
Bases de Dados Factuais , Fígado/metabolismo , Toxicogenética , Animais , Mineração de Dados , Humanos , Fígado/efeitos dos fármacos , Análise de Sequência com Séries de Oligonucleotídeos/métodos , Ratos , TranscriptomaRESUMO
Classification of proteins into families is one of the main goals of functional analysis. Proteins are usually assigned to a family on the basis of the presence of family-specific patterns, domains, or structural elements. Whereas proteins belonging to the same family are generally similar to each other, the extent of similarity varies widely across families. Some families are characterized by short, well-defined motifs, whereas others contain longer, less-specific motifs. We present a simple method for visualizing such differences. We applied our method to the Arabidopsis thaliana families listed at The Arabidopsis Information Resource (TAIR) Web site and for 76% of the nontrivial families (families with more than one member), our method identifies simple similarity measures that are necessary and sufficient to cluster members of the family together. Our visualization method can be used as part of an annotation pipeline to identify potentially incorrectly defined families. We also describe how our method can be extended to identify novel families and to assign unclassified proteins into known families.
Assuntos
Proteínas de Arabidopsis/classificação , Homologia de Sequência de Aminoácidos , Proteínas de Arabidopsis/química , Análise por Conglomerados , Biologia Computacional/estatística & dados numéricos , Gráficos por Computador/estatística & dados numéricos , Bases de Dados de Proteínas , SoftwareRESUMO
It is believed that 3.2 billion bp of the human genome harbor approximately 35000 protein-coding genes. On average, one could expect one gene per 300000 nucleotides (nt). Although the distribution of the genes in the human genome is not random,it is rather surprising that a large number of genes overlap in the mammalian genomes. Thousands of overlapping genes were recently identified in the human and mouse genomes. However,the origin and evolution of overlapping genes are still unknown. We identified 1316 pairs of overlapping genes in humans and mice and studied their evolutionary patterns. It appears that these genes do not demonstrate greater than usual conservation. Studies of the gene structure and overlap pattern showed that only a small fraction of analyzed genes preserved exactly the same pattern in both organisms.