Since getting into the industry, I’ve maintained that internet marketing is all about data management. When your electronic data about your business – everything from your leads and procedures to your public
presentation of your product needs to be managed properly. When that happens, you’re practically guaranteed business success (and fewer headaches due to inefficiencies).
This article is dedicated to a group of technologies collectively known as the “semantic web”. Trying to explain the semantic web to people who aren’t fluent in computer technology has been a challenge for engineers for years. The easiest way I can put it is this: the semantic web is when your computer can read data from a bunch of different software with little hassle. Being able to simultaneously search Microsoft Word, Facebook, your website, your customer relationship software, and YouTube then comparing the results to find intelligent answers to questions is what the semantic web is about. It’s about making work online easier, and it’s currently being pursued by a number of extremely intelligent people across the world working for little or no pay. But it’s also being watched by internet giants such as Google and Microsoft. Everything from the continued interest in social media search engines to Mark Hurd’s lawsuit are all linked together in a complicated web (pun intended) surrounding this developing technology. We’ve been reading about the semantic web for years without knowing it. This second post in the series (article one is here.) is written by a non-engineer to bring this information to other non-engineers.
Search engine optimization (SEO) is perhaps the most obvious place that semantic technologies can be put to use. Michael Daconta, Leo Orbst, and Kevin Smith wrote in their book The Semantic Web that XML was the basis of all semantic markup. For non technical folks, XML looks a lot like HTML in that tags are enclosed in brackets. However, XML is a very simple and easy to understand format. It’s readable both by machines and individuals, and since XML is stored in a text format it can be searched just as easily as a web page. So in simple terms, interoperability is about making data work between systems. XML has been the traditional format chosen to facilitate interoperability because it’s easy to read. Its ease of use is directly related to its searchability. Making data more indexable, searchable, and readable by machines is what the SEO professional is in the business of doing.
Now of course, many SEO professionals currently aren’t in this business. From my encounters with these people, they are in the business of building links and filling content with select keywords. In these two activities, you’ll find plenty of discussion and debate about best practices. What kinds of links, the content of the anchor text, the type of site they’re placed on, the geographic location of the linked site, and other factors are topics of heated arguments. Equally, every professional has an opinion on what keywords should be targeted in a particular topic area and whether broad terms or “long tail” terms are best in a given situation. Lest no one get the wrong impression, I offer these services and hold opinions about the best way to pursue these activities and I’ll still offer my services, even after this series is published. The truth is that keywords and linked sites are important for your search engine rankings. However, I feel that their importance will wane over the next few years.
First, the search engines know that people who do SEO cheat. As I’ve discussed elsewhere, I don’t cheat, but others do. People will host link farms (until they get caught), spam other people’s sites, and otherwise artificially inflate their rankings. Search sites will improve their algorithms to weed out spammers, but engines still have this odd view that important web pages are the ones linked to the most. Google Co-Founder Larry Page used to do research on the importance of academic articles based on how many times they were cited by other papers, but people outside of computer science circles will tell you that cited articles sometimes don’t mean much. In fact, I’ve had professors confide in me that certain studies have to be cited in literature reviews for the researcher to give the impression of doing a thorough job even though neither the researcher nor the reader are likely to have read the article that was cited. Pageviews, links, and rankings might indicate the popularity of something, but not its importance and I think there’s a fundamental difference. For example, Justin Bieber had the most viewed YouTube video in 2010 but I don’t think that makes him the most important thing to happen in 2010. Visitors value credible information, and unfortunately the current algorithms don’t have a way to measure that effectively. I will discuss the issue of trust more in another article and it is a large part of how data will be organized in years to come.
Second, natural language recognition software is getting more advanced. Black hat SEO folks could just stuff pages full of keywords and spam blogs. The more savvy (and often legitimate professionals) will use tools to gauge keyword density in order to optimize content. But the goal here is to avoid using the keyword infrequently so that search engines will spot it, but also not too frequently or they’ll tag the page as spam. In other words, the goal is to mimic useful content, not create it. Natural language software is getting better at picking this up.
SEO will have to adapt. I think that search marketers will look for ways that they can provide value driven content to a live audience in order to boost their rankings. They can’t resort to stale sales pitches or content that pretends to be valuable. They have to help people and discuss their product. As we will see in a later article, they need to build trust.
But they also need to expand their ability to relay information about their product. Good information is hard to make; sharing it can be even harder. What tools should SEO professionals pay attention to?
- XML sitemaps and RSS feeds. A search engine will be able to tell the difference between a feed and content, but the current use of sitemaps shows that they help search programs organize the data they’re presented with. Sitemaps allow non-programmers to organize their data and the traditional sitemap is easily modified to support searches for semantic data. If those slight modifications aren’t enough, semantic extensions are being developed. As I said, the semantic web is built around XML. Structuring your web data into feeds and maps will help better position your site as search engines begin employing more semantic methods of search.
- RDF. According to the W3C, “RDF extends the linking structure of the Web to use URIs to name the relationship between things as well as the two ends of the link (this is usually referred to as a “triple”). Using this simple model, it allows structured and semi-structured data to be mixed, exposed, and shared across different applications.” All of this will no doubt sound complex to the average person, but the simplest way to think of it is to imagine a graph of information. Information online is sometimes stored in silos – insulated pockets that can’t communicate with other machines. Graphing it allows machines to scan and read it more easily. Further, the easiest way to incorporate RDF in your website is to find extensions developed by third parties. Have your designer search for plugins for your CMS; you’ll be surprised to see what’s out there. Some search programs support RDF. And if it means interacting with places that your customers will be, you can’t afford to not consider the technology.
- HTML5 and Microformats.HTML5 is the next revision for HTML, the language used to write most web pages. HTML5, in its simplest form, is HTML written in a way that incorporates common usage and meaningful markup. Without going into a tutorial on HTML5 and its differences with previous versions, I’ll simply say that the main thing SEO folks should keep in mind is how the new markup uses meaningful tags. For example, writing Important Text instead of Important Text would be an instance of using more meaningful tags. The first is an emphasis tag, which tells anything reading the document that there’s emphasis on the words between the tags. The second just tells readers that the text is italic. Microformats are also supported as a way to insert small pieces of identifiable data that’s visible to search engines. Basically you tell the machine about the structure of your document.
One thing that all these tools have in common is that they all exist. Sure, RDF and HTML5 are being revised and improved on, but so is every other computer technology. There’s no great mystery to this. The only obstacle is the internet marketers themselves; many of them shy away from the technical underpinnings of their jobs. Most of them went to college to study marketing and business, not programming, but the successful internet marketer will keep an eye on the technology. SEO professionals have been trained to make web data visible to search engines. They already recognize certain characteristics of hyperlinks, anchor text, and the ability of search engines to read server side text. The next step lies in making sure that server side text is organized, structured, and available for new programs to use. That’s where the future of the industry lies.