Big Data exploration: discovering the new world of analytics
- Posted on March 17, 2017
In my last blog post, I explained how to start your journey towards Big Data and Advanced Analytics and briefly mentioned that one of the steps is to “experiment” these technologies and capabilities though POC’s or pilots before defining the final needs and requirements.
Therefore, in this post we will focus on understanding how to run this “experiment” keeping in mind that:
- There is no limit for the use case: Big Data technologies bring the ability to correlate any data, whatever its velocity, variety, or volume.
- It brings a new way of working: most of the big organizations are still running waterfall projects and experimenting the agile methodology.
- This technology is changing almost every day: What was not possible some months ago is possible today.
To better approach this experiment, let’s think as the European countries in the XV century when they launched the expeditions to discover the new world … strange, isn’t it?
There are two common facts between these expeditions and Big Data experiments:
- We don’t know upfront what we will get. It’s all about exploration and discovery. The expeditions in the XV Century were launched with the objective of exploring and discovering a new world, with no guarantee on the results. The same applies to our Big Data experiment. No one can really commit on the exact business value. However, as of the expeditions, we will learn from every experiment to finally, and after some iterations, reach the expected results.
- There is no standard approach, every organization will have to find the approach that fits them the best. Although all the European countries had the same objective, the ships took different directions, and approaches were different. The final approach for the big data experiment will depend on how mature and ready your organization is, on the skills and resources you have and on the tools and technologies you are using.
That being said, the main building blocks for the experiment and expeditions are the same.
So, who should be part of the Big Data experiment? Just like the Europeans in the XV century, the minimum crew should be composed of:
- The marines: they know the sea and are able to navigate through it. Those are the data engineers who can understand, transform and explore the data.
- The navigator: often a former marine, he defines the best trajectory taking into account different constraints, and trying to learn from the past experiences. This is for us the data scientist who creates the algorithms and navigates through the data to unlock the relevant insight.
- The sponsor representative: Sent by the expedition sponsors, their main responsibilities are to confirm the value of the treasures and the directions taken by the crew. Those are the business analysts and representatives, who know the business context and can help the team in focusing on the right and expected targets.
- The narrator: he is able to describe what happened during the expedition, and what treasures they discovered. This is our data visualization expert who will develop the appropriate visualizations to be used by the end users.
- Ship engineer: This is the data platform expert who can support on infrastructure and platform related issues.
- The captain: this is the project manager who coordinates all these people and makes sure the job is delivered.
Now, what should the crew do? Here are some common steps that the Europeans in the XV century used to follow:
- Plan: All the stakeholders need first to align on the objectives, the schedule, the crew, the budget, the scope and the constraints.
- Prepare: in this phase the crew will prepare the ship and their trip. In our experiment, the data engineers will collect the raw data, examine data, assess Data quality, study the feasibility of machine learning techniques and finally prepare and transform the data.
- Explore & Discover: this is the exciting part where the crew navigates through unknown oceans. It is also exciting for the data scientists and business representatives as they will work on different algorithms, simulate different scenarios, elaborate, test, evaluate, reiterate and tune models, and finally unlock the right insight.
- Visualize: in this phase the ship comes back with the expected treasures and maps, and the narrator describes there adventures. In the Big Data experiment, it consists of creating the rights visualizations that will empower the business to take the right decisions.
Note that all this process and every phase of it can be iterative. Remember that this is about data exploration and discovery. There is no standard approach, every organization needs to try and test until they find the right data and analytics approach that fits them the best.