How to take data science from siloed playgrounds to enterprise AI
- Posted on February 15, 2019
- Estimated reading time 3 minutes
Defining the data science lifecycle
Before you start it’s important to understand the four components of the data science lifecycle.
- Define the business problem. Close collaboration amongst business, IT and the data science stakeholders is required to gain an in depth understanding of the problem at hand. This not only drives the experiment, it’s also the foundation for the data and model selection process.
- Understand the data. Is your data source on-premises or cloud, database or files? Is your data pipeline streamed or batched, low or high frequency? Is the data science environment on-premises or cloud, database or data lake; small, medium or big data? As for data wrangling, exploration and cleaning – is your data structured or unstructured, and how will you approach data validation, integration, and clean up?
- Try, train and tune models. The model development lifecycle includes model selection, feature engineering, training and tuning models and data visualization, and export and packaging.
- Deploy, run and manage models. Model runtime operations include analyzing the runtime environment, determining compute requirements, monitoring and performance analysis, and model versioning and deployment.
Now, let’s enable rapid innovation
Self-service data science requires a combination of infrastructure, services and data… big data is required to achieve the best accuracy in machine learning and deep learning. Avanade leverages Microsoft Azure Data Lake, Azure Machine Learning, Azure Databricks, Azure CosmosDB amongst other Azure Data and AI services, to enable data scientists through the end-to-end data science lifecycle. Here’s what Microsoft Azure, along with other technologies like Python Tools for Visual Studio, Jupyter Notebooks and Microsoft Power BI provides:
- High scale, highly available storage and compute to manage and provision big data for self-service data sciences
- Powerful compute and massively parallel processing to train models.
- Comprehensive self-service data integration, exploration, wrangling, cleansing and visualization tools.
- Full suite of data science frameworks, toolsets, and services required for model development, training, packaging and deployment.
AI comes in many forms
Avanade understands how to utilize them all to deliver the transformation you expect. We start with a deep understanding of the data, apply machine learning, and surface it to the user in new and unexpected ways.