With machine learning tools becoming easier to use, more and more marketers are leveraging the technology. Now, it’s tempting to simply dump data into an algorithm in search of patterns (it’s so easy, after all). However, these algorithms, no matter how powerful, still obey the rules of “garbage in, garbage out.” Basically, no matter how nonsensical the input data, the model will still give an answer.
Therefore, it’s incumbent on the analyst to make sure the data is valid. In order to do so, it’s important to ask the right questions, not the easy questions, and craft your dataset for success. For example, it’s easy to feed customer purchase data into a machine learning model and get predictions on the likelihood of making future purchases. However, to get really informative predictions, the raw data must be transformed into intelligent signals using human knowledge.
So, how do you craft your dataset so the output is valuable? Read on.
Step 1: Ask the Right Questions
You’ve got your data, and you want some answers. Where do you start from here? Many tend to skip the important foundational questions and jump to the easy questions that focus more on the process than the desired results. Here’s what you should and shouldn’t be asking yourself as you get started:
Which algorithm should I use?
How much data do I need to get results?
What kind of problem do I want to solve? Am I trying to do a cohort analysis? Am I trying to target a promotion?
Am I looking for patterns that will be insightful and help me understand its structure? Or do I just want to make the best prediction?
How will I validate that my results are “good”?
What is one instance of data? Am I trying to learn about a customer? A group of customers?
Step 2: Prepare Your Dataset
Once your problem is clearly defined, you can begin to think about how you will prepare the data. Here are 5 key considerations:
Granularity - Examine whether your data is granular enough, or if you need to dive deeper. Do you want to consider the category level, or drill down into individual products?
Time Scale - Make sure your time scale is correct. For example, if you have transaction-level data, how do you want to aggregate it? Do you want to look at the last 30 days? 60 days? Or, the last 3 sales?
Transformation - Consider whether you’d like to transform your data in any way. For example, would you like to transform a raw number into a percentile? A standard deviation? Do you want to bin the data?
Outliers - Consider smoothing over any outliers that may affect the output, such as seasonality.
Additions - Think about what additional data you may want to generate. For example, if one feature is the total sales in a product category, do you also want to add a feature of what percentage that represents of the total sales? Or perhaps how that compares to another category?
Step 3: Train Your Model
Congratulations, you are now ready to train your model. Once your model is trained, it can help you find patterns in your data or make predictions. This is whether the machine learning can cause you to find non-intuitive information in the data.
How are you going to validate the model? Are you going to take one time period as training data and a later one as a validation? Are you going to train on one set of customers and validate on another?
Will you apply the same model to all the examples? Or do you have fundamentally different segments in your audience so you will train a model on each segment individually?
Step 4: Validate Your Model
Now that you’ve trained your model, it’s time to confirm that it has been done correctly. Ask yourself these questions:
What metric are you using to make sure the model is good?
Is it the percentage of correct predictions?
Should higher value customers be weighed more heavily?
Are certain segments of the audience always being classified correctly?
Are there some populations that you are having trouble marketing to?
- Does it seem that you aren't generating the correct features?
- Is it that you need more examples of that segment?
What if your model isn’t good enough?
Even with the appropriately crafted dataset and well-trained model, there’s only so far you can go with certain data. If you need to know more about your audience to better fuel your algorithm, there are tools and services that can help, including WEVO.
Interested in learning more about WEVO? Grab 15 minutes on our calendar.