Welcome to the Define module, which comes from the second "D" of the D-Wise framework.
In the Discovery module, we discussed the importance of understanding the problem and its context before moving on to solving it. So, by now, you should have some sort of documentation that defines several aspects of a problem that you are motivated to solve. And that's great! We hope you enjoy that exercise, which you will find yourself doing every time you engage in a new project.
Below is a quick overview of data, modeling, and algorithms. After that, we will dive into some more details about each topic and provide practical tips to make your project very successful.
The raw material to solve data analytics problems is data. And the outcome is also data. So, once we have a fair understanding of the problem, we can begin to define the input and output data. This process typically begins in a very loose manner and gains more structure as we make progress.
The process of defining the data is an extension of the discovery phase. It's always surprising to see how much clarity we gain about the problem after defining its input and output data, especially when clients are also engaged in this exercise.
If you review the documentation you've put together for this problem of yours and assume that it's a real-world problem, you must admit that your description of the problem is full of ambiguities (perhaps not so much after defining the data). This is not to say that you did a bad job, which you probably did if this was your first time doing this exercise. But this is not the point.
The point is that there is a key distinction between description and representation of a problem. Description is art, representation is science. Description is ambiguous, representation is precise.
Why is it important to make such a distinction?
At the core, we solve data analytics problems by combining data and algorithms. To combine data with algorithms we use computers. And to use computers, we need to represent the problem in a way that the computer understands. Computers can't deal with ambiguities.
Wait! But can't computers learn these days? Don't be foolish! Precisely defined models and algorithms are the foundation of machine learning–and any other data analytics technique.
Therefore, as you may have already concluded, your next step to solving your data analytics problem is to define a precise representation of it from the description you already have. This is what we call modeling.
Now that you have the data and the model of your problem, you want the computer to solve it. That's when algorithms kick in.
The type of algorithm you will use highly depends on the type of problem you have, including the size of your data, run time requirements, and desired accuracy.
In the next sections, you will dive deeper into each one of the subjects listed above.
By the end of this module, you should be ready to start implementing and solving your problem.