Exploring the dynamics of information

How does Big Data become Big Knowledge?

10/8/2015

Corporations that create large volumes of data are in a unique position. It's a big deal that they can glean the greatest insights that allow them to selectively improve their business. But it's even more special that they are also capable — not necessarily inclined, but certainly capable — of publishing scientifically relevant material that can help humanity to understand itself better. An example of the former can be found in any business intelligence solution that's used to examine consumer trends and determine which services are more profitable than others. But examples of the latter — of corporate Big Data influencing the advance of humanity beyond the crunchy outer shell of the organisation itself — are increasing fast. Earlier in 2015 the Open Data institute published a list of 270 companies who've actively invested in Open Data.

This is exciting!

So in this article I'll introduce you to a new model for envisioning Big* Data and provide a structured explanation of why it's so valuable to humans.

Data Volume and Span, a model for data

There are lots of ways to look at Big Data, but a low-effort, high-value way is to break it down into just two dimensions: Data Volume and Data Span.

Data Volume

Data Span

Data Volume comes primarily from market share (having lots of customers) and by product use (customers doing lots of stuff). Volume is just about how much data is being produced, and the rate at which it is produced. Sufficient data volume is needed to gain a truly representative understanding of the subject being examined.

Data Span, on the other hand, comes by recording different types of data; how many different types of data there are going into the data warehouse. Sufficient data span is needed to enable a variety of data correlations from which valuable insights can be derived.

Easy, right? There's often no need to make it more complex that. Unless you want to.

Of course, data volume and span both hinge on having a product, product component, sales process, or other system that actually records data. If you sell tacos from a truck you might find it trickier to record customer data than Facebook does, since your core product isn't connected directly to the internet generating data every time somebody likes it.

Now that we know the two fundamental ways in which data can be considered "Big", next we'll take a look at how all that data that gets recorded in database tables becomes transformed into knowledge within a living human brain.

Data Creation

As a technologically advanced species, we do stuff using technology. We look at pictures, read articles, we write comments, send messages, like things, place orders, fulfill orders, upvote, download, research, publish, buy, sell, steal, play, and more. Those are events. Every event we do online creates an enormous amount of data. Everything happened at a time, a place, using a device, using software, and for a reason. Many things happen in relation to an account. Many don't. Some things are done in a certain sequence, some things are done instead of other things, some simultaneously. When a theatre-goer buys a ticket to a show online, all manner of data is created: the time, the customer's payment account, the show name, their device, their payment method, the price, the amount charged, the amount paid.

Data Storage

Not all of the data created at an event has value, but some does. Not all is recorded, but much of it is! Where it goes is to a database.
There's too much data just to read it top to bottom. Moreover, there's next to no value in looking at an event. You'd be able to deduce that: a person, paid for tickets, for a show, from a vendor, at a time, for a price. That's meaningless. More valuable to know how many people bought tickets for that show, what the price changes over time were, how well the show sold compared to other shows, whether the time that the show played was advantageous to selling more expensive tickets, and other information that can help your business make more money for less effort. Such information is called "insights".

Data Presentation

To get those insights, data needs to be presented in a human-readable way. That means using a computer program to parse data recorded in tables into different forms depending on what I want to see. It needs to show me total revenue by show if I want to know what shows are most profitable. It needs to show me the rate at which customers visit my theatre if I want to know about customer loyalty. Once data is presented properly, the user gains insights. Once data is presented in a human-readable way it's no longer just data but information.

Data Consumption

This part is the easy bit. I'm a user, a theatre manager, and the data is presented how I want it. I simply examine the data and draw conclusions from the facts it presents. If a show called Hot Chrome sold every seat at 90% of our maximum ticket price, and a show called Potato Jefferson sold only half the seats at 75% of maximum ticket price, and both shows were advertised in the same way, held at the same time on the same night, and all other attributes were the same then I might deduce that Hot Chrome is a more profitable show to play in my theatre. Thus I've obtained an insight from data which a computer system turned into information and, by learning that information, acquired knowledge.
But to attain knowledge isn't as simple as looking at the screen. I need to fully understand what I'm looking at. While that depends on the "human-readability" of the information presented, it also depends on my competence, as a human, in interpreting what I see in objective ways.

Human Error

If I screwed this up my "insight" could turn out to be wrong — if, instead of determining that Hot Chrome sold better than Potato Jefferson I deduced that Hot Chrome sold more seats because it had a higher price due to the psychological principle of perceived value (whereby humans evaluate the quality of a thing based on how expensive it is). Whether the real insight is that the ticket price on Potato Jefferson was too low, or whether Hot Chrome is simply a more popular show, can't be conclusively determined just from my data presentation. It could be tested (by creating more data, perhaps by screening both shows again but swapping the ticket price), or a judgement call could be made based on my existing knowledge attained from other sources. Ultimately the avoidance of human error is a matter of sound judgement, being generally knowledgeable, and ones skill at doing ones job.

Informed Action

Once an insight has been gained, an informed action can be taken. If it turns out to be wrong, different action can be taken. For this reason, even accounting for human error, data-derived insights are immensely valuable. By using it in this way I've converted my business data into actionable business knowledge.

The result is not just big — it's huge.

Footnotes
* Big, small, or medium-sized data doesn't really matter in terms of how it works. Changing the data volume or span won't deviate or alter the process described above. A huge part of that, as you can see, is in the effectiveness of the data handling system in its ability to record data accurately and display it meaningfully to a human operator. We'll talk about Business Intelligence solutions in another article.

Citations
Open Data Institute (2015) Open data means business: UK innovation across sectors and regions. London, UK. Available at http://theodi.org/open-data-means-business
Learn Microsoft Business Intelligence Step by Step on Codeproject.com:
http://www.codeproject.com/Articles/751447/Learn-Microsoft-Business-intelligence-step-by-step

Comments