Inman

Google tech: from Web crawl to full sprint

Google is preparing a new bit of technology to help the company index the Web even faster. The company is going to be using a new publishing/subscribing protocol, dubbed PubSubHubbub. Let’s get our geek on and figure out what we can from this.

PubSubHubbub

PubSubHubbub … that big, long name pretty much sums up what it’s for. It’s a way for publishers (pub) and subscribers (sub) to communicate via a central repository (hub).

Currently, when publishing via most modern content management systems, a signal is sent out to the Web letting search engines and other services (like feed readers) know that new content is available. This is called a ping.

Then, at some point the search engine or other service goes to the source that sent out the ping. Finally, the original content publisher sends along all the new content.

Notice that there are three steps to this communication:

1. Original content is published and a ping is automatically sent out.

2. A service responds to the ping and requests the original content.

3. The original content is delivered.

PubSubHubbub is an attempt to shorten that loop. Instead of just using a ping system, it would actually forward out the full content to the hub. So it would happen like this:

1. Original content is published and automatically forwarded on, in full, to the hub.

2. A service checks the hub and receives the full, original content.

An important facet of this is that the hub is an independent service and anyone can make a hub. You could be your own hub, for example.

Search engines and other content aggregators will like this system because it’s more efficient for them than spidering the entire Web looking for changes and additions to pages.

Web publishers should like this system because it puts them in greater control of getting full content to search engines and other Web services. …CONTINUED

Ultimately, I bet that this PubSubHubbub is a lot more about the rapid-paced social media kind of content than Web pages. But it should be "backwards compatible."

Indexing and ranking

Search-engine indexing is the first step toward getting your content into a search engine. When your page or content is "indexed" it just means that the search engine is aware of your page or content.

If you’ve ever had trouble getting your site or page to even show up on a search engine (you get no results when you put the Web address of the page into the search box) then you’re having trouble with getting indexed.

Ranking is what happens next. The search engine goes through all the content that it has already indexed and tries to determine what is the best content for any given search. This is the part that most search-engine-optimization-focused real estate pros are aware of: "I want to be No. 1 on Google!" It’s a desire to rank well.

PubSubHubbub isn’t really about helping content rank better. It’s about getting content into the index quicker. You still need to be making good content to rank well.

In fact, since much more content will be getting indexed much more quickly, then making good content will likely be even more important.

In the SEO community, there’s a lot of concern that relying on PubSubHubbub instead of spidering will open a floodgate of spam. This is quite possible. I expect to see a lot of the "hubbub" part of PubSubHubbub to be about publishers who are jamming spam into the index.

The list version

Real-time indexing on Google (or other search engines) should help with getting the following page types indexed quicker:

1. Static pages that don’t change very often, but when they do are important. Example: your contact page.

2. Pages on less well-known sites that don’t get crawled by the search-engine spider all that often.

3. New pages, like perhaps a new listing that you’ve added to your site.

If you don’t see anything in this list that describes the kinds of pages or sites you have, you’re not off the hook. Does it describe any of your competitors? If so, your first-mover advantages might be wearing off as search-engine indexing goes real-time.

In addition, for sites with a great deal of traffic (i.e., sites where someone is aware of the bandwidth and computing-power costs associated with running the site), using PubSubHubbub may yield some cost savings. Real estate listings aggregators, larger real estate Web developers and the national brokerage sites would likely fall into this category.

Gahlord Dewald is the president and janitor of Thoughtfaucet, a strategic creative services company in Burlington, Vt. He’s a frequent speaker on applying analytics and data to creative marketing endeavors.

***

What’s your opinion? Leave your comments below or send a letter to the editor.