Lifecycle of Machine Learning Models
Understanding the Opportunities and Challenges
Javad Roostaei, Katya Bilyk, John Varelas, James MacDonald, Ryan Nagel - Hazen and Sawyer
Last Modified Jan 25, 2023
Machine Learning (ML) applications in the water sector are steadily growing. Engineers in the water industry have embraced data science tools to create powerful ML models on their desktops, but deploying them as real-time programs requires an additional set of skills—specifically knowledge of cloud/edge computing pipelines. For example, ML models need secure access to real-time data while also maintaining seamless interoperability with a range of typically independent, closed systems. However, connecting closed systems such as SCADA, LIMS, or CMMS to the cloud for real-time data analytics and ML is challenging and time-consuming. Utilities need secure ways to share the data they have without jeopardizing the security of these closed systems.
The components of deploying a model generally include training, real-time data connectivity, data storage, data transformation, model execution, model retraining, and scoring. Once a model is trained (either on a desktop or in the cloud), real-time feeds of data can be established using tools such as GE Proficy or Ovation. Secure on-premises connectivity tools can then be used to push the data from a utility’s business network to Azure SQL or similar database.
In one specific application, we used the Azure Security Center with multiple firewalls for Raleigh Water to establish trusted communications and provide a robust security framework. Data that is stored on an Azure SQL server must then connect to a program like Azure data management tools to perform all the activities related to preparing the data for the model. A program like Azure machine learning tools can be used for training (if not already completed) and model execution, retraining, and scoring. This entire process is known as an automated data pipeline because once configured, and it runs continuously without human intervention.
The results from the pipeline are displayed in a visualization tool like a Power BI dashboard, which is custom tailored to display the specific model predictions and associated data. We have proven the success of cloud-based ML for predicting plant influent flow 72-hours in advance to optimize wet weather management at a 75 mgd water reclamation facility.
In conclusion, effective deployment of ML models in the water sector can empower the operators and decision-makers to make more informed data-driven decisions. This paper will explain the steps involved in deploying a ML model securely to mine real-time data and provide analytics in the water sector.