Introduction

Completed

Azure Machine Learning is a cloud service for managing the life cycles of machine learning projects. Machine learning professionals, data scientists, and engineers can use Azure Machine Learning to train and deploy models and manage machine learning operations.

When anyone is monitoring an Azure Machine Learning environment, it's important to have visibility into all resources that might affect performance and AI model quality. Monitoring of Azure Machine Learning consists of the following areas:

  • Azure Machine Learning performance: Compute resources provide the infrastructure for running a machine learning workflow. They can affect Azure Machine Learning runs, experiments, and overall performance. This area is traditionally for operators and administrators.
  • Workflow problems: Throughout the life cycle of machine learning, problems and errors might occur during deployment of new models, during the running of a job, or in other circumstances. Both administrators and machine learning professionals might be interested in this area.
  • Machine learning models: Data drift, model prediction drift, poor data quality, and feature attribution drift can lead to outdated models and cause AI systems to become obsolete. Machine learning professionals and data scientists are the traditional owners of this monitoring.

Azure Monitor is the primary tool for managing an Azure Machine Learning environment. Azure Monitor provides built-in capabilities to monitor performance and workflow problems in Azure Machine Learning. You can also expand these capabilities for your own needs.

AI model management relies on collecting inference data in production. This analysis is part of monitoring Azure Machine Learning models.