MLS & AssociationsTechnology

Study sounds the alarm on real estate website scraping and data theft

Clareity Consulting and Distil Networks call for NAR mandate to protect online listing data
  • A white paper from Distil Networks and Clarerity Consulting says MLSs and listing website vendors should be required to implement anti-scraping protections, and virtually all MLS execs surveyed agreed.
  • Data scraping allows companies to use listing data without a license and make money competing against agents, brokers and MLSs. Stale scraped listing data can also affect brand reputations and increase liability for broker owners.
  • A NAR mandate won't be enough. MLSs need vendors to create tools to check for compliance with anti-scraping rules.

Agent and broker website vendors and MLSs have had more than a decade to figure out how to protect one of the industry’s most valuable assets from theft online, but most haven’t risen to the challenge.

Therefore, it’s time to mandate scraping protections, according to a white paper from real estate consulting firm Clareity Consulting and security technology firm Distil Networks.

“[W]eb scraping real estate data is easy and it is cheap and someone else is making money on MLS listings data and it is not going to stop unless the industry takes action,” the paper said.

Listing data is sent to thousands of locations via MLSs, third-party portals, and agent and broker websites. If one website is protected, scrapers often simply move to the next unprotected site in that market.

Data scrapers use software, or “bots,” to grab listing data and then use it to compete against agents and brokers by marketing services to sellers or creating derivative products, such as home valuations. In doing so, they avoid the restrictions of obtaining a data license and ignore copyright, the paper said.

6 ways empowered agents embrace disruption to drive success
Using technology to generate leads and win listings READ MORE

MLSs have compelling reasons to protect listing data, including the potential erosion of brand reputation and increased liability for their broker members.

Source: Distil Networks and Clareity Consulting

Source: Distil Networks and Clareity Consulting

MLSs want rules but need tech help

The study, “The State of Web Scraping Data Theft Across Real Estate Websites and MLSs,” surveyed 100 MLS execs representing more than 600,000 agents and brokers, and 14 IDX (Internet data exchange) and VOW (Virtual Office Website) providers operating 400,000 agent and broker websites.

MLS execs are key to addressing screen-scraping because they are the industry’s data gatekeepers. They help set priorities with MLS vendors, manage data license agreements with third-party portals, and comprise the committees at the National Association of Realtors that set rules for IDX and VOW, the paper noted.

The study found that 95 percent of MLS execs agreed that IDX sites should be subject to rules specifically mandating scraping protections. The vast majority of MLSs are Realtor-affiliated and therefore subject to NAR policy.

Nearly all MLS execs — 99 percent — said compliance with rules protecting misuse of MLS data is important.

There is already one NAR rule that requires VOW sites to “employ reasonable efforts to monitor for and prevent misappropriation, scraping, and other unauthorized uses of MLS listing information.”

But while virtually all MLS respondents were aware of the policy, a whopping 59 percent said they do not check for compliance of this anti-scraping rule. Many of the remaining MLSs use tests can’t determine whether the site is using outdated methods, such as relying solely on IP address recognition.

Therefore, adding a new rule is not enough. MLSs need — and 98 percent of them said they would support — standardized tests to facilitate anti-scraping compliance reviews.

Distil is currently working on a compliance tool, the paper said.

The company would stand to gain if hundreds of thousands of IDX websites were required to implement anti-scraping protections. In mid-2013, with strategic help from Clareity, Distil launched what it hoped would be an “industrywide intelligence network” to identify and thwart those who scrape real estate listing data without permission.

“Distil Networks has been working since that time to improve the anti-scraping protections available for the real estate industry and developed a solution that is cost effective for the smaller sites and does not rely on IP addresses to identify bots,” the paper said.

“Now it is time to explore how to move the industry expands from the ‘early adopter’ status to ‘mainstream’ where all real estate sites can have access to comparable levels of antiscraping protections. This study was initiated in order to better understand how to make that happen.”

Misunderstandings a barrier to data protection

Two misunderstandings may be at the heart of MLSs’ ineffective protection of listing data, the paper said. First, MLS execs seem to believe there are very few scrapers, and second, they believe they can identify and sue scrapers.

“Unfortunately, having seen the scraping reports from many popular websites, it is clear that there are many, many scrapers,” the paper said.

“On the second point, legal follow-up is very difficult, tedious, expensive, and very much reactive.

“Moreover, once the data are stolen, the damage is done and many scrapers have offline or non-public use of such data to monetize it so they cannot easily be found and brought to court.”

Two MLSs recently spent millions of dollars over the course of more than two years fighting a single alleged scraper.

Vendor disconnect

Vendors manage data security for the hundreds of thousands of websites they provide brokers and agents, but they do not seem to be fully aware of scrapers’ business impact on their customers and on their operational costs, the paper said.

As many as 14 percent of IDX and VOW vendors were not aware that there are companies scraping MLS data off of real estate websites to monetize that data without a license to do so, the study found. Nine percent of VOW providers were unaware of NAR’s anti-scraping rule for VOWs.

Many also seemed to be unaware that scraping bots account for 12 to 25 percent of a typical IDX site’s bandwidth.

Part of the problem may be a disconnect between what vendors think is important to their customers and what their customers say is important.

Almost all — 94 percent — of MLS executives indicated a vendor’s information security practices, including sophistication of anti-scraping technology, are important when selecting an MLS vendor, the study found.

But only 57 percent of vendors indicated their customers were interested in anti-scraping.

Source: Distil Networks and Clareity Consulting

Source: Distil Networks and Clareity Consulting

“It is up to the industry — especially the larger and more sophisticated real estate industry organizations and structures — to make it clear to IDX/VOW vendors that this is a problem that the industry wishes to solve,” the paper said.

“If so, it may require changes to MLS rules and data license agreements with content providers to these vendors to make that perfectly clear.”

Nearly two-thirds of vendors, 62 percent, indicated that compliance with MLS rules was of high importance to them.

Email Andrea V. Brambila.