As regular readers of this column know, I love data. I love it when there’s lots of it. And I love it when there are several sources of it. I find it very useful.

But this week I had the opportunity to listen to a master of understanding data, Edward Tufte. He had some criticisms of big data that I think are well worth considering — especially as the buzz around big data reaches fever pitch.

I’ll do what I can to present some of his thoughts, though almost certainly with less elegance.

Big data, so what?

First up, the bigness of data isn’t that impressive really. According to Tufte, what he calls "real scientists" have been using large amounts of data for many years now. They need to in order to do things like map DNA or count the number of stars out there or work with advanced particle physics.

Business leaders and policymakers are only just now getting into the big data game by way of marketing and hardware advancements (and the people who need to sell hardware advancements).

Much of so-called "big data" is redundant

Many of the individual data points are redundant. It’s easy to see how this is so. If you dig through any relatively well-connected person’s digital address book you’ll likely find several entries that correspond to a single individual. With even larger data sets this sort of thing multiplies.

Once you go through the effort to clean up the data, maybe it isn’t so big after all. The cleanliness of data sources is a constant bugbear for a database of any size.

Much of so-called "big data" is irrelevant

Tufte used an example of a security camera. Imagine a security camera pointed at a corner. In the course of weeks or months it is gathering up image data. Almost all of that data is, in the first place, redundant. It’s simply the same picture over and over and over again.

In the second place, most of that data is irrelevant. Most of the time a crime is not being committed. Maybe if something happens there once a year, then 10 minutes of footage is relevant. The other 525,938 minutes are irrelevant.

In this instance you certainly have a lot of data, big data even. But the part that’s useful isn’t big at all.

"Real science," according to Tufte, handles this by first coming up with a question and then examining the data source. Much of business data mining has this approach backwards, and as a result is able to draw questionable causal relationships between data in any given set.

Incentive hazards

The redundancy and irrelevancy of big data would be relatively harmless on their own. But problems arise when there are compensation-related issues. And with big data, there most certainly are.

There are incentive hazards at two points in the technology discussion. One is in the sale of hardware. Maintaining and processing large amounts of data requires large amounts of hardware and often expensive software.

Making "big data" a required bullet point on any project creates a de facto capitalization requirement. This is one of the points that I think is especially important in the real estate industry.

Running the big iron that can handle big data isn’t cheap. Many of the non-venture-capital-backed operations in real estate would have trouble competing in a world where big data was a requirement. Certainly the "Balkanized" nature of the industry wouldn’t help.

This is a strategic situation. And it’s one that is actively in play.

The other layer of incentive hazard is that it almost requires that anyone involved in examining or "mining" big data come up with some vast conclusion or insight. This is dangerous.

This is especially dangerous when those doing the examining and mining are the same people who will benefit from the findings. It is the classic "pay the auto repair shop to tell you what’s wrong with your car" problem.

In what Tufte calls "real science" there is no incentive to find something when nothing is there. In fact, with the system of peer review in journals there’s a pretty heavy disincentive to make stuff up.

Business culture is not like this. If a business pays a lot of money to a consultant to examine the gigantic pile of data (which may be irrelevantly and redundantly giant), then they better come up with something or else the money will be considered "wasted" and the person who commissioned the work will have failed. The temptation for a CYA (cover your ass) approach scales in relation to the size of the data and the size of the consultant fee.

Some thoughts

Tufte didn’t dissuade me from continuing to examine and use big data. But he certainly nailed some critical issues in the field. These issues are ones that are worth examining with any vendor or consultant who brings up the "need" for big data.

Show Comments Hide Comments


Sign up for Inman’s Morning Headlines
What you need to know to start your day with all the latest industry developments
Thank you for subscribing to Morning Headlines.
Back to top
Real estate news and analysis that gives you the inside track. Subscribe to Inman Select for 50% off.SUBSCRIBE NOW×
Log in
If you created your account with Google or Facebook
Don't have an account?
Forgot your password?
No Problem

Simply enter the email address you used to create your account and click "Reset Password". You will receive additional instructions via email.

Forgot your username? If so please contact customer support at (510) 658-9252

Password Reset Confirmation

Password Reset Instructions have been sent to

Subscribe to The Weekender
Get the week's leading headlines delivered straight to your inbox.
Top headlines from around the real estate industry. Breaking news as it happens.
15 stories covering tech, special reports, video and opinion.
Unique features from hacker profiles to portal watch and video interviews.
Unique features from hacker profiles to portal watch and video interviews.
It looks like you’re already a Select Member!
To subscribe to exclusive newsletters, visit your email preferences in the account settings.
Up-to-the-minute news and interviews in your inbox, ticket discounts for Inman events and more
1-Step CheckoutPay with a credit card
By continuing, you agree to Inman’s Terms of Use and Privacy Policy.

You will be charged . Your subscription will automatically renew for on . For more details on our payment terms and how to cancel, click here.

Interested in a group subscription?
Finish setting up your subscription