We are in the era of big data, where everywhere you turn, people are sharing strategies and tools to learn from it. It certainly can lead to amazing insights, however, what marketers often have is far from big data. The data might be small, but is high quality, well-understood, and extremely relevant to their business. We can regard this as “artisanal data,” which is well-crafted, small batch, and simultaneously traditional and innovative.
Lucky for the majority of us, artisanal data can also lead to amazing insights, as well. While the methods from learning from artisanal data are different than those we use for information overload, they still lead to fascinating and relevant insights about your business. You just need to treat the data a bit more delicately.
Evaluate whether big data is necessary
There are times when big data is a necessity, and times when artisanal data will do just fine. Consider the following when thinking about how to handle what you’ve got.
1. Is your data crafted well enough that it blocks out the noise?
When data is noisy, there needs to be a lot of it, i.e. big data, to find the signal in the din. Otherwise, the training algorithms will find patterns that are simply artifacts of the noise.
However, if your dataset is well-crafted, then small data suffices. For example, if your data set consists of marketing qualified leads, i.e. people who have said they are interested in your product, then you may not need tons of data to learn more about them. If you’re exploring your customers’ profiles on your site, you have high-relevant data. Contrast this with a dataset of website visitors, which may contain many people who are not prospective clients, and you’re looking at a situation where you’ll need the big data to rise above the noise.
2. What level of statistical significance works for what you’re trying to accomplish?
High statistical significance gives you the confidence that when you anoint a winner, you are probably right. While the winner in your sampled data is probably the true winner, there’s always some chance that the data is wrong. Small data still yield statistical significance, although often lower than 90% or 99%. You may not achieve high statistical significance with these spelunking journeys into artisanal data, but they can certainly point you in the right direction. And maybe 80% confidence is good enough for your given problem, especially if getting it wrong is not high risk.
If high statistical significance is a necessity, it may also be possible to combine multiple datasets to create a larger sample. This gives you access to more rigorous statistical methods. Of course, generalizing needs to happen with great caution, to ensure that the result will still be meaningful. It doesn’t necessarily follow that a statistically significant result is meaningful, if you’re measuring the wrong thing.
3. What can you accomplish with data mining?
While you may not have enough artisanal data to build a meaningful machine learning model, important signals can be extracted through descriptive algorithms. There may not be enough data to predict the future, but that doesn’t mean that the current data isn’t enlightening. From your artisanal data, you may be able to answer questions, such as:
Are there segments of your data that behave differently from the rest?
Is the behavior of those more familiar with your brand different than those who are still learning?
Does one channel tend to bring more of one type of customer than another?
If you can’t create more data, you may be able to make your data “wider” but creating more features, even if you aren’t increasing the raw dataset size. This is where the “artisanal” of the term comes in. One feature can become many, especially if the data is a time series. For example, if the base data were the number of leads generated by each channel in the past three months, you might consider adding additional features, such as:
The number of male/female leads
The delta number of leads from the week before
What percentage of leads came from this channel
Keep it simple
Once you’ve established whether or not you can get away with your artisanal data, it’s time to put it to work. My best advice is to not overcomplicate things and keep it simple.
1. Avoid the pull to overcomplicate the algorithm.
There is a lot of buzz around fancy machine learning algorithms, especially those related to deep learning. And, indeed, these methods are very powerful, especially in unstructured data, e.g. human speech. However, they require massive amounts of data and also tend to obscure the model, so it’s difficult to understand exactly what it’s doing. They are completely unsuited for artisanal data.
On the other hand, the “classics” are perfect for small datasets. Algorithms such as k-Nearest Neighbor, decision trees, and logistic regression work perfectly well with small data. They are easy to train, easy to understand, and easy to apply to your data. “Easy to understand” is not to be underestimated. It allows you to feel confident that you are learning what you think you are learning, and to have a human validate that your model makes sense.
Likewise, machine learning platforms are not necessary for small data. You don’t need the cloud. You’ll probably be able to iterate over your model more quickly without it and hard-to-configure platforms.
2. Simplify the validation method.
Validating your model is extremely important. Without checking your model before applying it, you can’t be sure your model will make reasonable predictions. Artisanal data means that you don’t have the luxury of leaving a subset of the data out of the training process for this purpose. However, there are other validation methods that will serve this function. Cross-validation can allow you to use your entire dataset for training, and yet still have the ability to validate the model.
Make sure you don’t underestimate the human sanity check. If your model seems to be making surprising predictions, it’s probably wrong. Ensure an expert applies their knowledge and some common sense to the model and predictions. This can be as simple as having someone peruse the results, or as complex as having the expert and model score the data independently, and then compare the results in a “double-blind” method.
Working with artisanal data is a craft, just like working with small batch food. The large scale methods used in bread factories don’t apply to the bakery down the street. Likewise, the methods used with big data don’t necessarily work with artisanal data. Ignoring this will lead to frustration, noisy models, and a lack of persuasive conclusions.
Hire a company that knows how to work with small batch data. While big data companies are popping up all over the place, it takes a lighter touch to deal with artisanal data. My company, WEVO, optimizes some pages with millions of visitors and some pages targeting a niche audience. Our toolset is diverse and refined enough to deal with the whole spectrum. But most importantly, we understand which tools are appropriate in which circumstances, and won’t subject your small data to the wrong type of algorithms.
For more information, email us at firstname.lastname@example.org.