Model monitoring solution
About the project:
In this project, I spearheaded the development of a model monitoring solution which drastically reduced data anomaly detection time from weeks to mere minutes. This was a significant enhancement for our operational efficiency. This solution automated the entire batch scoring process using Airflow, which orchestrated Amazon EKS Jobs for each segment of our system, including feature materialisation, model scoring, and monitoring subprocesses. To ensure comprehensive and reliable data monitoring, I utilized the Great Expectations library along with a custom API that I developed. This combination was used to validate datasets, as well as to compute the Population Stability Index (PSI) for model scores and the Characteristic Stability Index (CSI) for features with every daily run. In addition, the project involved creating an automated mechanism for our Airflow DAGs. Whenever a new model was deployed to our proprietary registry, a corresponding DAG was automatically created and scheduled to run the next day. Lastly, to provide a comprehensive view of our model's performance, all metrics were logged to a DataDog dashboard. This enabled us to monitor model performance consistently and make data-driven decisions for improvements.
Technology used:
Python, Kubernetes, DataDog, Great Expectations, AWS, S3, Airflow