RESUMO
Here, we describe single-tube long fragment read (stLFR), a technology that enables sequencing of data from long DNA molecules using economical second-generation sequencing technology. It is based on adding the same barcode sequence to subfragments of the original long DNA molecule (DNA cobarcoding). To achieve this efficiently, stLFR uses the surface of microbeads to create millions of miniaturized barcoding reactions in a single tube. Using a combinatorial process, up to 3.6 billion unique barcode sequences were generated on beads, enabling practically nonredundant cobarcoding with 50 million barcodes per sample. Using stLFR, we demonstrate efficient unique cobarcoding of more than 8 million 20- to 300-kb genomic DNA fragments. Analysis of the human genome NA12878 with stLFR demonstrated high-quality variant calling and phase block lengths up to N50 34 Mb. We also demonstrate detection of complex structural variants and complete diploid de novo assembly of NA12878. These analyses were all performed using single stLFR libraries, and their construction did not significantly add to the time or cost of whole-genome sequencing (WGS) library preparation. stLFR represents an easily automatable solution that enables high-quality sequencing, phasing, SV detection, scaffolding, cost-effective diploid de novo genome assembly, and other long DNA sequencing applications.
Assuntos
Sequenciamento de Nucleotídeos em Larga Escala/métodos , Sequenciamento Completo do Genoma/métodos , Análise Custo-Benefício , Diploide , Biblioteca Gênica , Genoma Humano , Genômica , Haplótipos/genética , Sequenciamento de Nucleotídeos em Larga Escala/economia , Humanos , Sequenciamento Completo do Genoma/economiaRESUMO
MOTIVATION: Achieving a near complete understanding of how the genome of an individual affects the phenotypes of that individual requires deciphering the order of variations along homologous chromosomes in species with diploid genomes. However, true diploid assembly of long-range haplotypes remains challenging. RESULTS: To address this, we have developed Haplotype-resolved Assembly for Synthetic long reads using a Trio-binning strategy, or HAST, which uses parental information to classify reads into maternal or paternal. Once sorted, these reads are used to independently de novo assemble the parent-specific haplotypes. We applied HAST to cobarcoded second-generation sequencing data from an Asian individual, resulting in a haplotype assembly covering 94.7% of the reference genome with a scaffold N50 longer than 11 Mb. The high haplotyping precision (â¼99.7%) and recall (â¼95.9%) represents a substantial improvement over the commonly used tool for assembling cobarcoded reads (Supernova), and is comparable to a trio-binning-based third generation long-read-based assembly method (TrioCanu) but with a significantly higher single-base accuracy [up to 99.99997% (Q65)]. This makes HAST a superior tool for accurate haplotyping and future haplotype-based studies. AVAILABILITY AND IMPLEMENTATION: The code of the analysis is available at https://github.com/BGI-Qingdao/HAST. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
RESUMO
To fully understand human genetic variation and its functional consequences, the specific distribution of variants between the two chromosomal homologues of genes must be known. The 'phase' of variants can significantly impact gene function and phenotype. To assess patterns of phase at large scale, we have analyzed 18 121 autosomal genes in 1092 statistically phased genomes from the 1000 Genomes Project and 184 experimentally phased genomes from the Personal Genome Project. Here we show that genes with cis-configurations of coding variants are more frequent than genes with trans-configurations in a genome, with global cis/trans ratios of â¼60:40. Significant cis-abundance was observed in virtually all genomes in all populations. Moreover, we identified a large group of genes exhibiting cis-configurations of protein-changing variants in excess, so-called 'cis-abundant genes', and a smaller group of 'trans-abundant genes'. These two gene categories were functionally distinguishable, and exhibited strikingly different distributional patterns of protein-changing variants. Underlying these phenomena was a shared set of phase-sensitive genes of importance for adaptation and evolution. This work establishes common patterns of phase as key characteristics of diploid human exomes and provides evidence for their functional significance, highlighting the importance of phase for the interpretation of protein-coding genetic variation and gene function.
Assuntos
Diploide , Genoma Humano/genética , Fases de Leitura Aberta/genética , Locos de Características Quantitativas/genética , Exoma/genética , Variação Genética , Haplótipos/genética , Humanos , Polimorfismo de Nucleotídeo Único/genéticaRESUMO
Currently, the methods available for preimplantation genetic diagnosis (PGD) of in vitro fertilized (IVF) embryos do not detect de novo single-nucleotide and short indel mutations, which have been shown to cause a large fraction of genetic diseases. Detection of all these types of mutations requires whole-genome sequencing (WGS). In this study, advanced massively parallel WGS was performed on three 5- to 10-cell biopsies from two blastocyst-stage embryos. Both parents and paternal grandparents were also analyzed to allow for accurate measurements of false-positive and false-negative error rates. Overall, >95% of each genome was called. In the embryos, experimentally derived haplotypes and barcoded read data were used to detect and phase up to 82% of de novo single base mutations with a false-positive rate of about one error per Gb, resulting in fewer than 10 such errors per embryo. This represents a â¼ 100-fold lower error rate than previously published from 10 cells, and it is the first demonstration that advanced WGS can be used to accurately identify these de novo mutations in spite of the thousands of false-positive errors introduced by the extensive DNA amplification required for deep sequencing. Using haplotype information, we also demonstrate how small de novo deletions could be detected. These results suggest that phased WGS using barcoded DNA could be used in the future as part of the PGD process to maximize comprehensiveness in detecting disease-causing mutations and to reduce the incidence of genetic diseases.
Assuntos
Embrião de Mamíferos , Fertilização in vitro , Genoma Humano , Sequenciamento de Nucleotídeos em Larga Escala , Mutação Puntual , Blastocisto/metabolismo , Éxons , Haplótipos , Heterozigoto , Humanos , Polimorfismo de Nucleotídeo Único , Deleção de SequênciaRESUMO
PurposeWe describe a novel syndrome in seven female patients with extreme developmental delay and neoteny.MethodsAll patients in this study were female, aged 4 to 23 years, were well below the fifth percentile in height and weight, had failed to develop sexually, and lacked the use of language. Karyotype and array chromosome genomic hybridization analysis failed to identify large-scale structural variations. To further understand the underlying cause of disease in these patients, whole-genome sequencing was performed.ResultsIn five patients, coding de novo mutations (DNMs) were found in five different genes. These genes fell into similar functional categories of transcription regulation and chromatin modification. Comparison to a control population suggested that individuals with neotenic complex syndrome (NCS)-a name that we propose herein-could have an excess of rare inherited variants in genes associated with developmental delay and autism, although the difference was not significant.ConclusionWe describe an extreme form of developmental delay, with the defining characteristic of neoteny. In most patients we identified coding DNMs in a set of genes intolerant of haploinsufficiency; however, it is not clear whether these contributed to NCS. Rare inherited variants may also be associated with NCS, but more samples need to be analyzed to achieve statistical significance.
Assuntos
Anormalidades Múltiplas/diagnóstico , Anormalidades Múltiplas/genética , Estudos de Associação Genética , Predisposição Genética para Doença , Testes Genéticos , Fenótipo , Adolescente , Adulto , Alelos , Substituição de Aminoácidos , Criança , Pré-Escolar , Fácies , Feminino , Frequência do Gene , Testes Genéticos/métodos , Genótipo , Humanos , Masculino , Síndrome , Sequenciamento Completo do Genoma , Adulto JovemRESUMO
BACKGROUND: Amniocentesis is a common procedure, the primary purpose of which is to collect cells from the fetus to allow testing for abnormal chromosomes, altered chromosomal copy number, or a small number of genes that have small single- to multibase defects. Here we demonstrate the feasibility of generating an accurate whole-genome sequence of a fetus from either the cellular or cell-free DNA (cfDNA) of an amniotic sample. METHODS: cfDNA and DNA isolated from the cell pellet of 31 amniocenteses were sequenced to approximately 50× genome coverage by use of the Complete Genomics nanoarray platform. In a subset of the samples, long fragment read libraries were generated from DNA isolated from cells and sequenced to approximately 100× genome coverage. RESULTS: Concordance of variant calls between the 2 DNA sources and with parental libraries was >96%. Two fetal genomes were found to harbor potentially detrimental variants in chromodomain helicase DNA binding protein 8 (CHD8) and LDL receptor-related protein 1 (LRP1), variations of which have been associated with autism spectrum disorder and keratosis pilaris atrophicans, respectively. We also discovered drug sensitivities and carrier information of fetuses for a variety of diseases. CONCLUSIONS: We were able to elucidate the complete genome sequence of 31 fetuses from amniotic fluid and demonstrate that the cfDNA or DNA from the cell pellet can be analyzed with little difference in quality. We believe that current technologies could analyze this material in a highly accurate and complete manner and that analyses like these should be considered for addition to current amniocentesis procedures.
Assuntos
Líquido Amniótico/metabolismo , Feto/metabolismo , Genoma Humano , Sequenciamento Completo do Genoma , Anormalidades Múltiplas/genética , Adulto , Amniocentese , Transtorno do Espectro Autista/genética , Estudos de Coortes , Variações do Número de Cópias de DNA , Doença de Darier/genética , Sobrancelhas/anormalidades , Estudos de Viabilidade , Feminino , Predisposição Genética para Doença , Humanos , Masculino , MutaçãoRESUMO
BACKGROUND: Amyotrophic lateral sclerosis (ALS) is a devastating disease whose complex pathology has been associated with a strong genetic component in the context of both familial and sporadic disease. Herein, we adopted a next-generation sequencing approach to Greek patients suffering from sporadic ALS (together with their healthy counterparts) in order to explore further the genetic basis of sporadic ALS (sALS). RESULTS: Whole-genome sequencing analysis of Greek sALS patients revealed a positive association between FTO and TBC1D1 gene variants and sALS. Further, linkage disequilibrium analyses were suggestive of a specific disease-associated haplotype for FTO gene variants. Genotyping for these variants was performed in Greek, Sardinian, and Turkish sALS patients. A lack of association between FTO and TBC1D1 variants and sALS in patients of Sardinian and Turkish descent may suggest a founder effect in the Greek population. FTO was found to be highly expressed in motor neurons, while in silico analyses predicted an impact on FTO and TBC1D1 mRNA splicing for the genomic variants in question. CONCLUSIONS: To our knowledge, this is the first study to present a possible association between FTO gene variants and the genetic etiology of sALS. In addition, the next-generation sequencing-based genomics approach coupled with the two-step validation strategy described herein has the potential to be applied to other types of human complex genetic disorders in order to identify variants of clinical significance.
Assuntos
Dioxigenase FTO Dependente de alfa-Cetoglutarato/genética , Esclerose Lateral Amiotrófica/genética , Dioxigenase FTO Dependente de alfa-Cetoglutarato/metabolismo , Estudos de Casos e Controles , Simulação por Computador , Efeito Fundador , Proteínas Ativadoras de GTPase/genética , Grécia , Haplótipos , Humanos , Desequilíbrio de Ligação , Neurônios Motores/patologia , Neurônios Motores/fisiologia , Polimorfismo de Nucleotídeo ÚnicoRESUMO
Recent advances in whole-genome sequencing have brought the vision of personal genomics and genomic medicine closer to reality. However, current methods lack clinical accuracy and the ability to describe the context (haplotypes) in which genome variants co-occur in a cost-effective manner. Here we describe a low-cost DNA sequencing and haplotyping process, long fragment read (LFR) technology, which is similar to sequencing long single DNA molecules without cloning or separation of metaphase chromosomes. In this study, ten LFR libraries were made using only â¼100 picograms of human DNA per sample. Up to 97% of the heterozygous single nucleotide variants were assembled into long haplotype contigs. Removal of false positive single nucleotide variants not phased by multiple LFR haplotypes resulted in a final genome error rate of 1 in 10 megabases. Cost-effective and accurate genome sequencing and haplotyping from 10-20 human cells, as demonstrated here, will enable comprehensive genetic studies and diverse clinical applications.
Assuntos
Genoma Humano , Genômica/métodos , Análise de Sequência de DNA/métodos , Alelos , Linhagem Celular , Feminino , Inativação Gênica , Variação Genética , Haplótipos , Humanos , Mutação , Reprodutibilidade dos Testes , Análise de Sequência de DNA/economia , Análise de Sequência de DNA/normasRESUMO
BACKGROUND: Celiac disease is a complex chronic immune-mediated disorder of the small intestine. Today, the pathobiology of the disease is unclear, perplexing differential diagnosis, patient stratification, and decision-making in the clinic. METHODS: Herein, we adopted a next-generation sequencing approach in a celiac disease trio of Greek descent to identify all genomic variants with the potential of celiac disease predisposition. RESULTS: Analysis revealed six genomic variants of prime interest: SLC9A4 c.1919G>A, KIAA1109 c.2933T>C and c.4268_4269delCCinsTA, HoxB6 c.668C>A, HoxD12 c.418G>A, and NCK2 c.745_746delAAinsG, from which NCK2 c.745_746delAAinsG is novel. Data validation in pediatric celiac disease patients of Greek (n = 109) and Serbian (n = 73) descent and their healthy counterparts (n = 111 and n = 32, respectively) indicated that HoxD12 c.418G>A is more prevalent in celiac disease patients in the Serbian population (P < 0.01), while NCK2 c.745_746delAAinsG is less prevalent in celiac disease patients rather than healthy individuals of Greek descent (P = 0.03). SLC9A4 c.1919G>A and KIAA1109 c.2933T>C and c.4268_4269delCCinsTA were more abundant in patients; nevertheless, they failed to show statistical significance. CONCLUSIONS: The next-generation sequencing-based family genomics approach described herein may serve as a paradigm towards the identification of novel functional variants with the aim of understanding complex disease pathobiology.
Assuntos
Doença Celíaca/genética , Sítios de Ligação , Criança , Frequência do Gene , Estudos de Associação Genética , Predisposição Genética para Doença , Sequenciamento de Nucleotídeos em Larga Escala , Humanos , Modelos Moleculares , Mutação , Polimorfismo de Nucleotídeo Único , Fatores de RiscoRESUMO
Cancer, like many common disorders, has a complex etiology, often with a strong genetic component and with multiple environmental factors contributing to susceptibility. A considerable number of genomic variants have been previously reported to be causative of, or associated with, an increased risk for various types of cancer. Here, we adopted a next-generation sequencing approach in 11 members of two families of Greek descent to identify all genomic variants with the potential to predispose family members to cancer. Cross-comparison with data from the Human Gene Mutation Database identified a total of 571 variants, from which 47 % were disease-associated polymorphisms, 26 % disease-associated polymorphisms with additional supporting functional evidence, 19 % functional polymorphisms with in vitro/laboratory or in vivo supporting evidence but no known disease association, 4 % putative disease-causing mutations but with some residual doubt as to their pathological significance, and 3 % disease-causing mutations. Subsequent analysis, focused on the latter variant class most likely to be involved in cancer predisposition, revealed two variants of prime interest, namely MSH2 c.2732T>A (p.L911R) and BRCA1 c.2955delC, the first of which is novel. KMT2D c.13895delC and c.1940C>A variants are additionally reported as incidental findings. The next-generation sequencing-based family genomics approach described herein has the potential to be applied to other types of complex genetic disorder in order to identify variants of potential pathological significance.
Assuntos
Predisposição Genética para Doença , Genômica/métodos , Sequenciamento de Nucleotídeos em Larga Escala/métodos , Neoplasias/genética , Proteína BRCA1/genética , Proteínas de Ligação a DNA/genética , Humanos , Proteína 2 Homóloga a MutS/genética , Mutação , Proteínas de Neoplasias/genética , Neoplasias/patologia , Linhagem , Polimorfismo de Nucleotídeo ÚnicoRESUMO
The systematic characterization of somatic mutations in cancer genomes is essential for understanding the disease and for developing targeted therapeutics. Here we report the identification of 2,576 somatic mutations across approximately 1,800 megabases of DNA representing 1,507 coding genes from 441 tumours comprising breast, lung, ovarian and prostate cancer types and subtypes. We found that mutation rates and the sets of mutated genes varied substantially across tumour types and subtypes. Statistical analysis identified 77 significantly mutated genes including protein kinases, G-protein-coupled receptors such as GRM8, BAI3, AGTRL1 (also called APLNR) and LPHN3, and other druggable targets. Integrated analysis of somatic mutations and copy number alterations identified another 35 significantly altered genes including GNAS, indicating an expanded role for galpha subunits in multiple cancer types. Furthermore, our experimental analyses demonstrate the functional roles of mutant GNAO1 (a Galpha subunit) and mutant MAP2K4 (a member of the JNK signalling pathway) in oncogenesis. Our study provides an overview of the mutational spectra across major human cancers and identifies several potential therapeutic targets.
Assuntos
Genes Neoplásicos/genética , Mutação/genética , Neoplasias/genética , Neoplasias/metabolismo , Transdução de Sinais/genética , Neoplasias da Mama/classificação , Neoplasias da Mama/genética , Variações do Número de Cópias de DNA/genética , Análise Mutacional de DNA , Feminino , Subunidades alfa de Proteínas de Ligação ao GTP/genética , Humanos , Neoplasias Pulmonares/classificação , Neoplasias Pulmonares/genética , MAP Quinase Quinase 4/genética , Masculino , Neoplasias/enzimologia , Neoplasias/patologia , Neoplasias Ovarianas/classificação , Neoplasias Ovarianas/genética , Neoplasias da Próstata/classificação , Neoplasias da Próstata/genética , Proteínas Quinases/genética , Receptores Acoplados a Proteínas G/genéticaRESUMO
Rapid advances in DNA sequencing promise to enable new diagnostics and individualized therapies. Achieving personalized medicine, however, will require extensive research on highly reidentifiable, integrated datasets of genomic and health information. To assist with this, participants in the Personal Genome Project choose to forgo privacy via our institutional review board- approved "open consent" process. The contribution of public data and samples facilitates both scientific discovery and standardization of methods. We present our findings after enrollment of more than 1,800 participants, including whole-genome sequencing of 10 pilot participant genomes (the PGP-10). We introduce the Genome-Environment-Trait Evidence (GET-Evidence) system. This tool automatically processes genomes and prioritizes both published and novel variants for interpretation. In the process of reviewing the presumed healthy PGP-10 genomes, we find numerous literature references implying serious disease. Although it is sometimes impossible to rule out a late-onset effect, stringent evidence requirements can address the high rate of incidental findings. To that end we develop a peer production system for recording and organizing variant evaluations according to standard evidence guidelines, creating a public forum for reaching consensus on interpretation of clinically relevant variants. Genome analysis becomes a two-step process: using a prioritized list to record variant evaluations, then automatically sorting reviewed variants using these annotations. Genome data, health and trait information, participant samples, and variant interpretations are all shared in the public domain-we invite others to review our results using our participant samples and contribute to our interpretations. We offer our public resource and methods to further personalized medical research.
Assuntos
Bases de Dados Genéticas , Variação Genética , Genoma Humano/genética , Fenótipo , Medicina de Precisão/métodos , Software , Linhagem Celular , Coleta de Dados , Humanos , Medicina de Precisão/tendências , Análise de Sequência de DNARESUMO
Tanoak (Notholithocarpus densiflorus) is an evergreen tree in the Fagaceae family found in California and southern Oregon. Historically, tanoak acorns were an important food source for Native American tribes, and the bark was used extensively in the leather tanning process. Long considered a disjunct relictual element of the Asian stone oaks (Lithocarpus spp.), phylogenetic analysis has determined that the tanoak is an example of convergent evolution. Tanoaks are deeply divergent from oaks (Quercus) of the Pacific Northwest and comprise a new genus with a single species. These trees are highly susceptible to "sudden oak death" (SOD), a plant pathogen (Phytophthora ramorum) that has caused widespread deaths of tanoaks. In this study, we set out to assemble the genome and perform comparative studies among a number of individuals that demonstrated varying levels of susceptibility to SOD. First, we sequenced and de novo assembled a draft reference genome of N. densiflorus using cobarcoded library processing methods and an MGI DNBSEQ-G400 sequencer. To increase the contiguity of the final assembly, we also sequenced Oxford Nanopore long reads to 30× coverage. To our knowledge, the draft genome reported here is one of the more contiguous and complete genomes of a tree species published to date, with a contig N50 of â¼1.2â Mb, a scaffold N50 of â¼2.1â Mb, and a complete gene score of 95.5% through BUSCO analysis. In addition, we sequenced 11 genetically distinct individuals and mapped these onto the draft reference genome, enabling the discovery of almost 25 million single nucleotide polymorphisms and â¼4.4 million small insertions and deletions. Finally, using cobarcoded data, we were able to generate a complete haplotype coverage of all 11 genomes.
Assuntos
Fagaceae , Genoma de Planta , Fagaceae/genética , Filogenia , Anotação de Sequência Molecular , Genômica/métodos , Polimorfismo de Nucleotídeo ÚnicoRESUMO
In this chapter, we describe how Long Fragment Read (LFR) technology can be applied to samples consisting of very few cells (5-20) to enable complete genome sequencing and haplotyping with a very low false positive error rate. LFR is a method for processing DNA or cells prior to sequencing on any second-generation DNA sequencing platform (e.g., MGI's DNBSEQ, Illumina sequencers, etc.). First, the LFR process incorporates a low-bias whole genome amplification step allowing accurate sequencing from very low DNA inputs (as low as 32 picograms, the mass contained within 5 diploid human cells). In addition, LFR enables the haplotyping of nearly all genomic variations with N50 contig lengths up to ~1 Mb. Furthermore, if data from this method are analyzed with parental genotype data, it is possible to generate phased variants in uninterrupted contigs spanning entire chromosomes. Importantly, the barcoding process utilized in this method allows for the detection and correction of most amplification, sequencing, and mapping errors, yielding false positive error rates as low as 10-9. Finally, the cost of this method is modest and enables extremely high-quality whole genome sequence and haplotype data from as few as 5 cells. We know of few other methods that can achieve this.
Assuntos
Genoma Humano , Sequenciamento de Nucleotídeos em Larga Escala , Humanos , Haplótipos/genética , Análise de Sequência de DNA/métodos , Sequenciamento de Nucleotídeos em Larga Escala/métodos , DNA , TecnologiaRESUMO
In this chapter, we describe single-tube long fragment read (stLFR), a simple preparation method for whole-genome sequencing and physical haplotyping based on the DNA co-barcoding strategy. Similar to LFR, stLFR applies the concept of adding the same barcode to subfragments derived from the same long DNA molecule. However, instead of a 384-well plate, stLFR uses the surface of micron-sized magnetic beads to create millions of virtual compartments in a single reaction tube. This is enabled by a split and pool barcoded bead preparation process capable of generating ~500,000 copies of the same unique barcode, from a library of ~3.6 billion unique barcodes, on each bead. The instruments and devices used in the stLFR process are easily accessible in nearly all standard molecular biology laboratories, and the cost of reagents can be as low as 30 dollars per sample. stLFR libraries can be sequenced by standard second-generation sequencing instruments (e.g., MGI or Illumina devices), and the barcode sharing information enables detection and phasing of all variations, including large structural variations. In addition, stLFR data can be used to scaffold contigs and de novo assemble genomes.
Assuntos
Sequenciamento de Nucleotídeos em Larga Escala , Análise Custo-Benefício , Haplótipos , Sequenciamento Completo do Genoma , Sequenciamento de Nucleotídeos em Larga Escala/métodos , Biblioteca Gênica , Análise de Sequência de DNARESUMO
In this chapter, we describe a simple, low-cost method for making many copies of a single DNA molecule (1-10 kb in length) as a concatemer on a long DNA strand. This can enable applications requiring high-quality contiguous sequence and haplotype data from long single DNA molecules at large scale.
Assuntos
DNA , Sequenciamento de Nucleotídeos em Larga Escala , Haplótipos/genética , Análise de Sequência de DNA/métodos , Sequenciamento de Nucleotídeos em Larga Escala/métodos , DNA/genéticaRESUMO
Sequencing of hypervariable regions as well as internal transcribed spacer regions of ribosomal RNA genes (rDNA) is broadly used to identify bacteria and fungi, but taxonomic and phylogenetic resolution is hampered by insufficient sequencing length using high throughput, cost-efficient second-generation sequencing. We developed a method to obtain nearly full-length rDNA by assembling single DNA molecules combining DNA co-barcoding with single-tube long fragment read technology and second-generation sequencing. Benchmarking was performed using mock bacterial and fungal communities as well as two forest soil samples. All mock species rDNA were successfully recovered with identities above 99.5% compared to the reference sequences. From the soil samples we obtained good coverage with identification of more than 20,000 unknown species, as well as high abundance correlation between replicates. This approach provides a cost-effective method for obtaining extensive and accurate information on complex environmental microbial communities.
Assuntos
Eucariotos , Microbiota , Filogenia , Eucariotos/genética , Genes de RNAr , Análise de Sequência de DNA/métodos , RNA Ribossômico/genética , Bactérias/genética , Microbiota/genética , DNA Ribossômico/genética , SoloRESUMO
It has been shown that bone marrow-derived stem cells can form a major fraction of the tumor endothelium in mouse tumors. To determine the role of such cells in human tumor angiogenesis, we studied six individuals who developed cancers after bone marrow transplantation with donor cells derived from individuals of the opposite sex. By performing fluorescence in situ hybridization (FISH) with sex chromosome-specific probes in conjunction with fluorescent antibody staining, we found that such stem cells indeed contributed to tumor endothelium, but at low levels, averaging only 4.9% of the total. These results illustrate substantial differences between human tumors and many mouse models with respect to angiogenesis and have important implications for the translation of experimental antiangiogenic therapies to the clinic.
Assuntos
Células da Medula Óssea/citologia , Transplante de Medula Óssea/efeitos adversos , Células Endoteliais/citologia , Neoplasias/irrigação sanguínea , Neovascularização Patológica , Células-Tronco/fisiologia , Cromossomos Humanos X , Cromossomos Humanos Y , Células Endoteliais/fisiologia , Feminino , Imunofluorescência , Humanos , Hibridização in Situ Fluorescente , Masculino , Neoplasias/patologia , Neovascularização Patológica/sangueRESUMO
Early detection and treatment of visual impairment diseases are critical and integral to combating avoidable blindness. To enable this, artificial intelligence-based disease identification approaches are vital for visual impairment diseases, especially for people living in areas with a few ophthalmologists. In this study, we demonstrated the identification of a large variety of visual impairment diseases using a coarse-to-fine approach. We designed a hierarchical deep learning network, which is composed of a family of multi-task & multi-label learning classifiers representing different levels of eye diseases derived from a predefined hierarchical eye disease taxonomy. A multi-level disease-guided loss function was proposed to learn the fine-grained variability of eye disease features. The proposed framework was trained for both ocular surface and retinal images, independently. The training dataset comprised 7,100 clinical images from 1,600 patients with 100 diseases. To show the feasibility of the proposed framework, we demonstrated eye disease identification on the first two levels of the eye disease taxonomy, namely 7 ocular diseases with 4 ocular surface diseases and 3 retinal fundus diseases in level 1 and 17 subclasses with 9 ocular surface diseases and 8 retinal fundus diseases in level 2. The proposed framework is flexible and extensible, which can be inherently trained on more levels with sufficient training data for each subtype diseases (e.g., the 17 classes of level 2 include 100 subtype diseases defined as level 3 diseases). The performance of the proposed framework was evaluated against 40 board-certified ophthalmologists on clinical cases with various visual impairment diseases and showed that the proposed framework had high sensitivity and specificity with the area under the receiver operating characteristic curve ranging from 0.743 to 0.989 in identifying all identified major causes of blindness. Further assessment of 4,670 cases in a tertiary eye center also demonstrated that the proposed framework achieved a high identification accuracy rate for different visual impairment diseases compared with that of human graders in a clinical setting. The proposed hierarchical deep learning framework would improve clinical practice in ophthalmology and broaden the scope of service available, especially for people living in areas with a few ophthalmologists.
RESUMO
Breast cancers can be divided into subtypes with important implications for prognosis and treatment. We set out to characterize the genetic alterations observed in different breast cancer subtypes and to identify specific candidate genes and pathways associated with subtype biology. mRNA expression levels of estrogen receptor, progesterone receptor, and HER2 were shown to predict marker status determined by immunohistochemistry and to be effective at assigning samples to subtypes. HER2(+) cancers were shown to have the greatest frequency of high-level amplification (independent of the ERBB2 amplicon itself), but triple-negative cancers had the highest overall frequencies of copy gain. Triple-negative cancers also were shown to have more frequent loss of phosphatase and tensin homologue and mutation of RB1, which may contribute to genomic instability. We identified and validated seven regions of copy number alteration associated with different subtypes, and used integrative bioinformatics analysis to identify candidate oncogenes and tumor suppressors, including ERBB2, GRB7, MYST2, PPM1D, CCND1, HDAC2, FOXA1, and RASA1. We tested the candidate oncogene MYST2 and showed that it enhances the anchorage-independent growth of breast cancer cells. The genome-wide and region-specific differences between subtypes suggest the differential activation of oncogenic pathways.