Classifying “Bullshit” Quotes With Azure Machine Learning

Machine Learning has a lot of really useful applications which I’ve previously written about. It allows us to do things such as predictive maintenance, fraud detection, churn/upselling/cross-selling analytics, customer lifetime value estimation and segmentation, supply chain management, demand forecasting and a host of other stuff.

However, Machine Learning also enables us to do a lot of really quite stupid and funny things, which is one of the reasons I like it so much. One such thing would be the classification of quotes into “bullshit” and “non-bullshit” categories.

That’s right; I’m not above using my powers for evil!

logo

Sorry, I’m allergic to bullshit

Lately there’s been some research into the mechanisms of human receptivity to bullshit and capacity for analytical thinking and reflection, with the former apparently being linked to the latter.

The most recent study “On the reception and detection of pseudo-profound bullshit” by Pennycook et al. (2015) has been making the rounds in the tabloids, with journalists and writers happily generalising the rather specific findings of the researchers to question the mental capacity of those who enjoy sharing motivational and inspirational quotes on social media.

Acceptance Of Profound-Sounding "Bullsh*t" Linked To Lower Intelligence
Source: IFLScience

Obviously it’s not quite that simple, as the study deals specifically with so-called “pseudo-profound” bullshit – which is basically stuff that sounds profound in some way, but is in fact complete, meaningless nonsense. Real life examples of this, according to the researchers, are some quotes by New Age-guru Deepak Chopra, such as the following:

“Attention and intention are the mechanics of manifestation.”

You really can’t make this shit up, can you? 

(Well actually, you can, using Wisdom of Chopra, which generates a random quote based on text from Chopras Twitter account.)

Anyway, in order to help those who were slightly worried that their quote-sharing practices may or may not reflect a below average mental capacity, I decided to use Azure Machine Learning and R (and some Power Query) to create a classification service to determine the “bullshit probability” of quotes and statements.

Hey, maybe I am using my powers for good after all!

Building a bullshit classifier

So how did I do this? Well, my approach was to gather a limited sample of both bullshit and non-bullshit quotes and treat it as a binary classification problem (supervised learning), producing a probability of 0 to 1 of any quote being bullshit based on the contents of the text. This requires an initial labeling of quotes, with 0 denoting non-bullshit and 1 denoting bullshit.

Since Pennycook et al. (2015) specifically refer to tweets by Deepak Chopra as examples of pseudo-profound bullshit and use randomly generated statements based on said tweets for their studies, I decided it was an acceptably small leap of logic for me to use quotes by Deepak Chopra exported from Goodreads as labeled cases of bullshit in my dataset rather than tweets. I’m not on good terms with the Twitter API, and using import.io I was easily able to export the quotes into the format I needed.

import.io
You really should be using this tool.

For cases of non-bullshit, I decided to use quotes by evolutionary biologist Richard Dawkins. For defining a polar opposite of Deepak Chopra, he is the first one that comes to mind.

So in my dataset, bullshit is defined as stuff Deepak Chopra says – labeled 1 – and non-bullshit is defined as stuff Richard Dawkins says – labeled 0. For my classifier, every piece of text exists only on a scale from Richard Dawkins to Deepak Chopra, and is scored by the probability of that piece of text being bullshit, i.e. being somehow similar to something Chopra might say, or dissimilar in the sense that it is more similar to something Dawkins might say.

After having collected and labeled the data, I did some quick text preprocessing using Power Query and R, and made sure the cases of bullshit and non-bullshit were reasonably well balanced in the dataset.

I ended up with a current total of 3200 cases (each being approximately one sentence of variable length) after preprocessing, which is I would say is a very small sample for training such a text classifier and a very limited basis for generalisation. I’ll definitely be looking to expand the data set with more cases later, but still, this experiment was really just for learning and for fun.

Moving on, I used feature hashing to turn the text into feature columns, and principal component analysis to reduce the number of features to a reasonable amount for preserving their variance. I used only these text features for classification.

I ended up using a Two-Class Boosted Decision Tree (blue below) with the principal components as features for classifying, but I also had luck with using a Two-Class Locally-Deep Support Vector Machine (red below) with only raw hashing features.

True vs. false positives

I was happy to see that the model both cross-validates and performs quite all right, especially considering the very limited sample size and my rather frivolous labeling strategy for those samples.

There are definitely some patterns seperating the language of Chopra from that of Dawkins, and the model is able to distinguish between these with accuracy of about 0.65 on as of the time of writing.

It’s obviously going to be wildly inaccurate for statements that have nothing in common with the language in the quotes by either Chopra and Dawkins, but if a quote has some elements from one or both of those extremes, the classifier should in theory work reasonably well. Certainly better than a coin flip!

Bullshit-as-a-Service

So being quite satisfied, I decided to publish the model as prediction service using Azure. This was done with a few clicks, and lo and behold: Kjetils bullshit classifier as a web service.

Kjetils bullshit classifier

Input any quote you like, and the model scores its probability of being bullshit on a scale from Richard Dawkins (0.00) to Deepak Chopra (1.00). The classifier has obvious limitations, but it was a fun exercise nonetheless. Maybe I’ll improve it with a better sampling and labeling strategy in the future?

The cool part is that with Azure Machine Learning, you can get an idea for doing stuff like this and get it up and running in one form or another within a couple of evenings. It’s really quite ingenious.

Maybe you’ll even find a problem of some real business value to tackle, rather than classifying bullshit statements …

If you want to learn more about text classification, Microsoft has several ready-made examples of how to do this within Azure ML, and there are tutorials readily available. Machine Learning is becoming a natural part of the toolkit of any data analyst or data scientist, and Azure ML makes it very approachable for pretty much anyone with a certain amount of technical and analytical skills.

So what are you waiting for? Get modeling!

The Democratization of Data Science and Machine Learning

When I had the opportunity to work with Machine Learning for a client in the health sector, I started to realize exactly how Amazon had been cunningly manipulating me since I signed up for an account in 2010 to shop for some page-turning thrillers.

Not unexpectedly, this opened up the floodgates for a steady flow of emails, chock full of suggestions and recommendations for other books that Amazon was determined I needed to buy. This flow of emails has since been maintained by diligent clicking and shopping, because Amazon is 100 % correct. I simply must have these books.

In the exact same way, Netflix makes sure that you have no choice but to binge-watch the whole newest season of House of Cards when it is released. Netflix, through smart use of enormous amounts of data, machine learning and other advanced analytics techniques and technologies, knows better than anybody else – probably even better than yourself – exactly what you want to watch and how they’re going to get you to watch it.

Netflix tweet

This form of customer intelligence is now more available than ever, for the benefit of those with sufficient amounts of data and want to know the customers better than they know themselves.

The concepts mentioned previously are used today in a wide variety of fields, for example optimization of stock portfolios and price prediction, picture and speech recognition, fraud detection, text mining and analytics, sentiment analysis of social media traffic, data driven customer segmentation and marketing, predictive maintenance, automatic and personalized web design, and recommendations for what to get at restaurants.

The list of use cases is steadily growing, and almost every week a new, innovative use case for machine learning makes its way into the news. This is a cake that all firms who wish to be data driven both could and should help themselves to, if only just to have a taste.

I’ll be back

Machine learning is nothing new, and algorithms that learn from data and build statistical models to explore, predict and recommend has existed for a long time.

In the 70’s and beginning of the 80’s, many computer engineers, scientists and statisticians were preoccupied with figuring out artificial intelligence. The emergence of machine learning is in many ways related to these endeavours, although the first algorithms are much older.

The field of machine learning didn’t really come to fruition until the 90’s, which is undeniably connected to the fact that at this point most of the researchers had watched the first two Terminator movies. In their fear of accidentally creating Skynet, they instead turned their attention to using machine learning to handle practical business problems such as prediction and cluster analysis.

Arnold Schwarzenegger as The Terminator

Not unlike Arnold Schwarzenegger himself, machine learning is currently experiencing a new renaissance, which is primarily driven by the increased access to computational power and cloud services. This also means that the threshold for firms both big and small to experiment with sophisticated modeling and analytics on small, medium-sized or big data is lower than ever.

This dramatically shortens the road from a vision about data driven decision making to operationalized predictive models implemented in any analytical tool, readily available to the decision makers and operative personnel.

And as if that wasn’t already enough, these systems and models can keep learning and improving as they’re being used in the business, making them increasingly more accurate and able to account for changing assumptions over time if they’re implemented, used and maintained correctly.

The Data Science ecosystem

Many who are familiar with the term “data mining” might think this sounds remarkably similar, and they would be very much right. If you ask two different experts about what machine learning and data mining is, you’re likely to get two different but overlapping answers.

Lately, we’ve started using the term “Data Science” as the name of an independent discipline consisting of concepts from analytics, machine learning, data mining, pattern recognition, data warehousing, graph theory and databases, visualization and prediction. To this we add a large ecosystem of associated tools and platforms, and define the whole shebang as “extraction of knowledge from data.”

The Data Science ecosystem
Source: Computerworld

I prefer to look at Data Science as a more down to earth and practical approach to Big Data. For while the Data Science field by no means disciminates against data of smaller sizes, the methods and techniques you use will (mostly) be the same whether you’ve got with big or small data.

What Data Science does is deal with exactly what you’re going to do with all this data and how it connects to real business problems, and in the process remove some of the buzzing you usually hear when somebody says “big” and “data” in that order.

If you start talking about the Data Science method in a concrete and practical manner instead of going vaguely on about how all this Big Data must be useful somehow, you might experience that people actually listen to you instead of dismissing it as bullshit. And I find that machine learning especially, while admittedly being only a small part of Data Science, is an excellent approach to concretize how a tricky business problem can be solved with data, if appropriate.

This is perhaps also the biggest difference between Data Science and traiditional Business Intelligence. The Data Science umbrella term gathers under it a selection of concepts, methods and tools that actually empower the very desireable business approach of “think big, acting small.”

Data driven descisions and operations founded on sophisticated analytics will never become a reality unless a business is able and willing to start with something somewhere, and with the Data Science approach, the reward and potential is maintained while the risks are minimized.

Acute hopsital admissions, weather and climate – a proof of concept

A Data Scientist will start by asking increasingly concrete questions.

Which problem are we trying to solve? Is it a classifcation problem or a regression problem? Will it be most appropriately modeled by decision trees or neural networks? Maybe it’s a job for a recommender system, or is pattern recognition, sequence modeling or assocation rules more appropriate?

And last, but by no means least; do we have the data we need, is the information sufficient to solve the problem in an appropriate manner, and how can we prepare them to exploit the information maximally?

When I and a colleague were working on a proof of concept for a client in the health sector together with Microsoft, we used machine learning to analyze and model relationships between acute hospital admissions and external factors such as weather, climate and air quality.

Both research, intuition and current business practices dictate that these relationships are very much real, and our task was to explore, quantify and exploit them if they existed in the data. Since Microsoft has made machine learning readily available in their Azure cloud environment, it was natural for us to take it one step further and try to turn the relationships into predictive models.

Microsoft Azure Machine Learning
Source: CloudTimes

If sufficiently accurate, such models could be used to predict the number of acute admissions in hospitals, providing valuable input for emergency preparedness management and in turn ensure patients better and more timely care.

This covers only a very tiny portion of the scope of opportunities for such technology, and there is a wide variety of related potential use cases for machine learning and Data Science within healthcare.

Throughout the process we had to ask ourselves many of the questions I previously mentioned, and as I mentioned early on I finally realized exactly how Amazon knows better what I like than I do myself.

Properly prepared, quality data usually doesn’t lie, and if the connections, interactions and relationships you’re interested in are actually present in these data, sufficiently sophisticated machine learning algorithms will almost almost be able to find them with an appropriate amount of human help.

I need your clothes, your boots, and your motorcycle

Obviously you don’t need machine-made assication rules and sophisticated market basket analysis to reveal that the combination of full MC gear, a pair of boots and a motor cycle will satisfy any terminator.

However, there might be relationships in your data that are far less obvious, exceedingly more complex and much more time-consuming to uncover manually. If you’re then able to represent in your data set an almost complete selection of data points which are sufficiently relevant for the problem you’re trying to solve, you’ll likely be able to produce models with frighteningly accurate results on real data that might be worth their digital weight in gold in business settings.

In our POC we managed to account for a great deal of the variation in the number of acute admissions based on hospital data, calendar data, weather data and air quality data. This shows that it is definitely possible to use such data to model and predict how many patients with, for example, airway related or skin related diagnoses are likely to show up in the mergency ward on a given day.

There are of course a loads of other factors in addition to the weather that affects this number, and a machine learned model will never become more accurate than the data foundation it is based on.

Hospital admissions and weather correlations
Average correlations between groups of weather factors and admissions of patients aged 0-9 with respiratory diseases.

Still, it is important to be aware that such models don’t always have to be very accurate to provide valuable information.

If the alternative is to flip a coin to decide between two options, the recommendations of a model that is right 51 % of the time will qualify as actionable insights. If you value your gut feeling at 60 %, this becomes the benchmark for the model to beat instead.

For example, when planning costly direct marketing a predictive model which increases the response rate or conversion rate by only a few thousandths might be any marketers wet dream.

Hasta la vista, baby

Many firms talk about running the business on data and making decisions based on quality intelligence and well-founded assumptions, but the road from coming up with a hypothesis or an idea to operationalizing a predictive model to use for decision making may seem daunting.

The good news – or the bad news, depending on how you see it – is that the biggest challenge is the same as before: data quality. Data profiling and smart data preparation is likely to take up 80 % of the time in a Data Science project.

It’s worth noting that we’re definitely not talking about cleaning up your entire data warehouse here; this is neither realistic nor desireable. We want to do just enough to provide sufficiently reliable answers to the business problems at hand, guided by an implicit or explicit cost-benefit analysis.

When all this is done, you can lean back, drink your coffee and let the machines churn the numbers, guiding the algorithms and acting like the high-tech puppet master you’ve always wanted to be. After some adjustments you’ve created an API or a data feed that you can expose to everything from QlikView and Tableau to good old Excel, and you’re ready to start predicting.

Data Scientist
Source: Edureka!

Now that machine learning is available in the cloud, large on-premise installations of expensive software is no longer necessary. Knowledge of tools such as R and Python is very desireable, but even this is not necessarily required to get results.

What is needed first and foremost is vision and willingness to experiment to potentially change the business, and people who actually know how to run Data Science initiatives.

This entails an understanding of how to turn business problems into data problems – and in turn answers – as well as how to communicate across the business, sources of domain and data expertise, and disciplines such as analytics and statistics.

Perhaps most importantly, it requires a good head for validating and evaluating the process and the results to seperate the “maybe wrong” from the “perhaps right.” A real Data Scientist, unlike a Sith, never deals in absolutes.