Pesquisa | BVS IEC

HyenaDNA: Long-Range Genomic Sequence Modeling at Single Nucleotide Resolution.

Nguyen, Eric; Poli, Michael; Faizi, Marjan; Thomas, Armin; Birch-Sykes, Callum; Wornow, Michael; Patel, Aman; Rabideau, Clayton; Massaroli, Stefano; Bengio, Yoshua; Ermon, Stefano; Baccus, Stephen A; Ré, Chris.

ArXiv ; 2023 Nov 14.

Artigo em Inglês | MEDLINE | ID: mdl-37426456

RESUMO

Genomic (DNA) sequences encode an enormous amount of information for gene regulation and protein synthesis. Similar to natural language models, researchers have proposed foundation models in genomics to learn generalizable features from unlabeled genome data that can then be fine-tuned for downstream tasks such as identifying regulatory elements. Due to the quadratic scaling of attention, previous Transformer-based genomic models have used 512 to 4k tokens as context (<0.001% of the human genome), significantly limiting the modeling of long-range interactions in DNA. In addition, these methods rely on tokenizers or fixed k-mers to aggregate meaningful DNA units, losing single nucleotide resolution where subtle genetic variations can completely alter protein function via single nucleotide polymorphisms (SNPs). Recently, Hyena, a large language model based on implicit convolutions was shown to match attention in quality while allowing longer context lengths and lower time complexity. Leveraging Hyena's new long-range capabilities, we present HyenaDNA, a genomic foundation model pretrained on the human reference genome with context lengths of up to 1 million tokens at the single nucleotide-level - an up to 500x increase over previous dense attention-based models. HyenaDNA scales sub-quadratically in sequence length (training up to 160x faster than Transformer), uses single nucleotide tokens, and has full global context at each layer. We explore what longer context enables - including the first use of in-context learning in genomics. On fine-tuned benchmarks from the Nucleotide Transformer, HyenaDNA reaches state-of-the-art (SotA) on 12 of 18 datasets using a model with orders of magnitude less parameters and pretraining data. On the GenomicBenchmarks, HyenaDNA surpasses SotA on 7 of 8 datasets on average by +10 accuracy points. Code at https://github.com/HazyResearch/hyena-dna.

The Development of Biophotovoltaic Systems for Power Generation and Biological Analysis.

Wey, Laura T; Bombelli, Paolo; Chen, Xiaolong; Lawrence, Joshua M; Rabideau, Clayton M; Rowden, Stephen J L; Zhang, Jenny Z; Howe, Christopher J.

ChemElectroChem ; 6(21): 5375-5386, 2019 Oct 31.

Artigo em Inglês | MEDLINE | ID: mdl-31867153

RESUMO

Biophotovoltaic systems (BPVs) resemble microbial fuel cells, but utilise oxygenic photosynthetic microorganisms associated with an anode to generate an extracellular electrical current, which is stimulated by illumination. Study and exploitation of BPVs have come a long way over the last few decades, having benefited from several generations of electrode development and improvements in wiring schemes. Power densities of up to 0.5âW m-2 and the powering of small electrical devices such as a digital clock have been reported. Improvements in standardisation have meant that this biophotoelectrochemical phenomenon can be further exploited to address biological questions relating to the organisms. Here, we aim to provide both biologists and electrochemists with a review of the progress of BPV development with a focus on biological materials, electrode design and interfacial wiring considerations, and propose steps for driving the field forward.

RESUMO

RESUMO

ENVIAR RESULTADO:

SELEÇÃO DE REFERÊNCIAS

DETALHE DA PESQUISA