Functional yeast promoter sequence design using temporal convolutional generative language models

bioRxiv – October 22, 2024

Source: medRxiv/bioRxiv/arXiv

Summary

Controlling gene expression is vital for many biological processes. A new approach harnesses advanced deep learning to design synthetic yeast promoter sequences effectively. This method, called Gen-DNA-TCN, generates diverse sequences that mimic real ones, paving the way for innovative applications in synthetic DNA design.

Abstract

Promoter sequence design is the key to accurately control gene expression processes that play a crucial role in biological systems. Thanks to the recent community effort, we are now able to elucidate the associations between yeast promoter sequences and their corresponding expression levels using advanced deep learning methods. This milestone boosts the further development of many downstream biological sequence research tasks like synthetic DNA design. In this work, we propose a novel synthetic promoter sequence design method, namely Gen-DNA-TCN, which exploits a pre-trained sequence-to-expression predictive model to facilitate its temporal convolutional neural networks-based generative model training. A large-scale evaluation suggests that Gen-DNA-TCN successfully generated diverse synthetic promoter sequences that also encode similar distributions of transcription factor binding sites to real promoter sequences.