PhD in Environmental Sciences thesis defence-Dania Tamayo-Vera
Dania Tamayo-Vera will defend her PhD in Environmental Sciences thesis titled "Application Of Evolutionary Algorithms and Machine Learning in Agroclimatic Studies: Enhancing Model Explainability and Performance Through AutoML" on October 22 at 9:00 am in AVC 278.
Abstract:
This thesis examines the integration of machine learning (ML) models in agroclimatic research to address challenges posed by climate change. A systematic review highlights significant gaps in scalability and documentation in conventional ML methods, which hinder replicability and adaptability. To overcome these limitations, the research proposes the use of Automated Machine Learning (AutoML) frameworks to enhance scalability and performance. AutoML is positioned as a transformative tool for agroclimatic research, offering more efficient solutions for sustainable agriculture. This research introduces a novel metric that optimizes ML models not only for prediction accuracy but also for interpretability by stabilizing SHapley Additive exPlanations (SHAP) values, which indicate the contribution of each feature to predictions. The thesis introduces the Precise and Interpretable Multi-objective Optimization (PIMA) framework, an AutoML approach designed to optimize both predictive accuracy and model interpretability. PIMA, using the Non-dominated Sorting Genetic Algorithm II (NSGA-II), minimizes Mean Squared Error (MSE) while stabilizing SHAP values. Experiments show that PIMA outperforms traditional AutoML frameworks like H2O, offering models that provide actionable insights for agricultural stakeholders. Using historical data and climate projections for Prince Edward Island (PEI), the study predicts a potential yield decline of up to 70% by the end of the century under the SSP5-8.5 scenario, highlighting the urgent need for adaptive farming practices and greenhouse gas reductions. The methodology is transferable to other regions and crops, offering a valuable tool for future agricultural planning. At the regional level, traditional ML models are applied to predict potato yield at the postal code level using climate, soil, and vegetation index data. SHAP analysis identified temperature and vegetation index as key predictors, while rainfall and soil retention were highlighted as important factors influencing yield outcomes in the non-irrigated region of PEI.
Everyone is welcome.