Summary

Food security is a major challenge, since crops perform well below their maximum potential. A central aim in plant biotechnology is to improve breeding and transgenic approaches to develop crop varieties with increased yields, quality and environmental stress tolerance. To accomplish this, we must expand our understanding of the key biological processes that result in phenotypic variation and how they cooperatively affect the plant system, resulting in growth or, in case of pathogen infections, defence states. Phenotypes are known to arise from two main cellular processes: (i) expression of the genetic code and (ii) metabolic reactions catalysed by the expressed enzymes. Gene expression is primarily regulated from the DNA sequence via multiple DNA-protein interactions at specific binding sites with conserved DNA sequence motifs. Metabolism then interacts with the environment, which defines the type of biomolecules produced that result in a given phenotype. Our research objective is to develop and test a framework for prediction of plant phenotypes from their genotypes, comprising (i) prediction of gene expression levels from DNA sequence and (ii) prediction of phenotypes from gene expression levels by modeling metabolism. 

The large amount of available ‘omics’ data contains valuable information for interpreting the regulatory principles linking genotypes to phenotypes. In order to decipher this information, novel algorithms, such as deep learning, lead to improved performance by resolving many of the limitations of classical algorithms. Instead of requiring data transformation or feature engineering, deep neural networks (DNNs) can automatically discover the best representations of omics data, facilitating models with remarkable performance. Here, we will take advantage of the learning capabilities of DNNs to build full-sequence models, combining both genome- and population-scale approaches to decode the complete DNA regulatory grammar at the scale of whole regions down to single nucleotide variations, giving unprecedented insight into the regulation of gene expression in plants. To build advanced models of metabolism, we will combine knowledge-driven systems biology and data-driven deep learning approaches to design models with inherently interpretable structure. The knowledge of metabolic reactions will be transferred to and used as a prior in the DNN architecture, enhancing model interpretability to gain insight into the molecular processes and interactions governing phenotypes.

The proposed project is organized into 5 interdependent work packages (WP), with the first and last WP comprising support tasks related to computational resources and project management. In WP2, we will develop gene expression models and in WP3, we will build knowledge and data-driven models of metabolism. In WP4, the models will be interpreted and new findings experimentally validated.

The proposed multidisciplinary systems approach can improve our understanding of the regulatory principles that orchestrate extensive changes in gene expression and reprogramming of metabolism in response to environmental stresses and pathogens. Our focus on crop species enables the direct translation of the new-found knowledge to agronomically important goals of improving crop productivity and resilience under field conditions. The enhanced predictive models will enable design and testing of novel molecular systems with desired expression levels and phenotypic outcomes, with the potential to greatly accelerate experiment throughput and decrease development costs in plant biotechnology. By identifying groups of genes that have key effects on phenotypic outcomes and solutions to control their expression levels, we have a high possibility of developing breakthrough methods for crop breeding. The project thus promotes national and EU directives on food security and is of exceptional scientific and societal importance for Slovenia, the EU and globally.