The project addressed the challenge of food security by developing advanced methods for predicting plant phenotypes from genotypes, with the aim of improving crop yield, quality, and resilience. It is based on integrating two key processes that determine phenotype: gene expression regulation and cellular metabolism. Using large-scale molecular data together with machine learning and systems biology approaches, we developed models that predict gene expression directly from DNA sequences and link gene expression to phenotypic traits through metabolic modeling.
According to the objectives of the proposed project, the work programme is structured into 5 interdependent thematic work packages (WP, Figure 2A) with 11 tasks (each corresponds to a milestone).
WP1. Acquisition and setup of computational and data resources
WP2. Development of gene expression models
WP3. Knowledge and data-driven metabolic modeling
WP4. Model interpretation and validation
WP5. Project management and dissemination

In WP1, we established extensive data and computational resources, including integration of genomic, transcriptomic, and phenotypic data for model and agronomically important species such as Arabidopsis and potato. A server hosting updated potato genomic models is available at unitato.nib.si, while other datasets and models are accessible via GitHub and Zenodo.
In WP2, we trained deep neural networks to predict gene expression and related features, such as transcription start sites, directly from DNA sequences around genes. We also developed models capturing global genotypic variation in Arabidopsis, enabling prediction of environmental phenotypes from DNA and supporting optimization of plants for specific conditions, such as high or low temperatures.
In WP3, we developed models linking gene expression to observable phenotypes such as growth and yield. We constructed a comprehensive potato metabolic model covering both primary (growth-related) and secondary (defense-related) metabolism, demonstrating its ability to analyze trade-offs between growth and defense responses. Moreover, data-driven machine learning models enabled highly accurate prediction of potato yield based on early measurements from the first two months of growth.
In WP4, we showed that deep neural networks can capture the regulatory grammar of DNA, including sequence motifs and their interactions. Interpretation of the models revealed key defense-related regulatory mechanisms and enabled the identification of important metabolic pathways. We further developed hybrid approaches integrating knowledge of molecular interactions in gene expression with deep learning, improving biological interpretability. Model predictions were successfully validated using multiple independent experimental approaches.
In the final WP, project results were disseminated through eight scientific publications and active participation in twelve scientific conferences. Four master’s theses were successfully completed within the project.
Overall, the project provided new scientific insights and technologies, supported by publications in leading journals, and contributed to workforce development. Its results enable faster development of improved crop varieties and represent an important contribution to sustainable agriculture and food security in Slovenia, Europe, and globally.