Building The Implicit Social Graph

Dec
5

Posted by Justin Briggs

Google Plus is Google's latest attempt at building an explicit social graph that they control, but Google has been building out an implicit social graph for quite some time. This graph is still relatively naive compared to the maturity of the link graph, but search engines continue to develop this graph. Since it is already directly influencing rankings, and its value will increase, it’s important to understand how this type of social graph is being built. In this post, I’ll look at some of the methods for building the social graph, as well as looking at explicit vs. implicit social graphs.

You can be certain Google is building an implicit social graph:
 
“we studied the implicit social graph, a social network that is constructed by the interactions between users and their groups. We proposed an interaction-based metric for computing the relative importance of the contacts and groups in a user's egocentric network, that takes into account the recency, frequency, and direction of interactions” – Google
 

Building the Implicit Graph

This graph can be built by looking at Google’s link graph.
 
social graph via link graph
 
By looking at links between profiles, and reinforcing relationships based off content analysis (username, bio, etc), search engines can confirm ownership, or at least believe with a high degree of certainty that all of these web properties are in fact owned by one person.
 
The implicit graph can grow from one seed explicit relationship.
 
building the implicit social relationship
 
In this example, accounts A and B have defined an explicit relationship via reciprocal following on Twitter. The degree of this relationship can be gauged on interactions, but more on that in a moment. However, A and B have not continued this relationship across all networks, such as Facebook (or maybe the relationship is not crawlable).
 
Let’s say for example I’m user B and user A is Hello Kitty. Hello Kitty shares a link publicly on Facebook, and then later I perform the following search.
 
hello kitty in search results
 
The explicit relationship on one social network can be used to evaluate URLs based off the behavior on a different social network where I have not explicitly defined it. This brings up all sorts of questions about privacy, and Google will tread lightly here, as you don’t want Google displaying known relationships that you haven’t made public. However, displaying and knowing are independent. They might know your relationships, even if they never expose them to you.
 
In the Google paper “Suggesting (More) Friends Using the Implicit Social Graph” they clearly make a distinction here:
 
“we draw a sharp distinction between each user's egocentric network and the global or sociocentric network that is formed by combining the networks of all users. […] By showing users suggestions based only on their local data, we are able to protect user privacy and avoid exposing connections between the user's contacts that might not otherwise have been known to him”
 

Interaction Rank

Relationships can be further analyzed by computing Interaction Rank, which measures the degree of relationship between two users.
 
Interaction Rank: A metric computed by looking at the number of exchanges between users, weighting each interaction as a function of recency. The interaction weight decays exponentially over time. It also looks at the relative importance of ongoing interactions.
 
Note: In the paper, Interaction Rank is defined in terms of building an implicit social network on top of email interactions, which is a data set Google has a lot of access to, but could be applied to non-email social graphs.
 
Google may use three criteria to measure edge weights. In graph theory, edges are the connections between nodes (the blue lines in the image above).
 
1. Frequency: Users / groups that interact frequently are more important to those with infrequent interactions.
2. Recency: The change in interactions over time. Recent interactions should carry more weight than interactions in the past.
3. Direction: Interactions a user initiates are more significant than those a user does not initiate.
 
Criteria like direction, for example, can help determine spam relationships. Spam accounts send out more interactions than they receive.
 
One obvious short-coming of the model is that Interaction Rank is higher for active social media users than less active users. However, since Interaction Rank is used to sort relationships relative to one egocentric view of the graph, and not across a global graph, it can function as metric to sort the relative importance of relationships in regards to the central node/user.
 

They’ve Been Doing This for a Long Time

Google is getting a lot more attention regarding social recently, but Google has been doing this for quite some time. Google launched the Social Graph API back in February of 2008, which is an API that taps into one form of an explicit graph based off XFN and FOAF. This tool has been tracking reciprocal Twitter relationships, and many other things, for years.
 
Rand's twitter relationships
 
Some of the social network building they’ve done can be seen via the social graph API.
 
Rand's social networks via social graph API
 
Rand gave examples back in July of this type of deep dive crawl.
 
Social circle in Google
 
They crawl from this seed set of explicit opt-ins to build out a wider set of related connections.
 
Implicit Social Circle in Google
 
In the example above, Google is crawling multiple hops away from a seed node to build out an implicit social graph. In the example above, a relationship between Rand and Andrew can be defined, and this relationship analysis can be carried over to networks where that relationship isn’t explicitly defined. The Interaction Rank between Rand and Andrew on Twitter can set the degree that Google pulls signals from these implicit connections.
 

And Here Comes Google Circles

This all changes with Google Plus. One of the limitations of building an implicit social graph is that you don’t have the data to test against to confirm the predictions and relationships that graph discovers. It still has to depend on the data made public, but is limited by relationships that are held private (aka Facebook). Google Plus, among other things, creates a massive set of explicit social graph data, which can be used for machine learning and accuracy checking.
 
It’s easy to imagine that Google will use the implicit social graph to predict relationships with relative degrees of certainty about the nature and importance of that relationship. Now Google Plus data can be pushed into the algo, in the same way human reviews could be pushed into Panda. And not only that, but they’re bucketed into contextual based relationships using Circles. The implications of this are huge.
 
However, an explicit social network will not replace the implicit network.
 
From the same Google paper:
 
“One survey of mobile phone users in Europe showed that only 16% of users have created custom contact groups on their mobile phones. In our user studies, users explain that group-creation is time consuming and tedious. Additionally, groups change dynamically, with new individuals being added to multi-party communication threads and others being removed. Static, custom-created groups can quickly become stale, and lose their utility”
 
They go on to say:
 
“Our algorithm is inspired by the observation that, although users are reluctant to expend the effort to create explicit contact groups, they nonetheless implicitly cluster their contacts into groups via their interactions with them”
 
This clearly shows at least some of the shortcomings of the explicit social graph.
 
Pros and Cons of Implicit and Explicit Social Graphs
 
Even with publicly available, and privately available, explicit social data, there is still a strong incentive to build out the implicit graph. The explicit graph can be used to make improvements upon this graph. The implicit graph is one area where Google has a significant advantage over Facebook.
 
It’s no secret that the social graph appears to be the next evolution with increasing uses of social factors, social elements in search, and mechanisms that will lead into AgentRank/AuthorRank, which will tie directly into the implicit social graph.
 
p.s. Some great additional reading on this topic: Are You Trusted by Google? via SEO By the Sea's Bill Slawski.

Do you like this post? Yes No

SEOmoz Daily SEO Blog

SMX East 2011 Recap for SEOBook

Dec
5

Useful Links:

SMX Facebook: http://www.facebook.com/searchmarketingexpo
Twitter # activity for the conference: http://twitter.com/#!/search/%23smx

Another successful SMX East is in the books. From all accounts, the event seemed to go through flawlessly and without a hitch. Kudos to Danny Sullivan, Claire Schoen, and crew as the caliber of speakers, sessions, and attendees was top notch, as always. Judging from the event, search marketing is alive and thriving more than ever before. There was a healthy mix of industry experts, consultants, large corporations, agencies, and small businesses. The sessions covered a broad range of topics from beginner link building fundamentals to more advanced technical SEO sessions covering site architecture, technical coding optimization and everything inbetween. A huge thank you goes out to the organizers for a job well done.

It seemed there were two themes that surfaced regularly – Panda and Google Plus/+1. Clearly, there are still many webmasters struggling with Panda and how to properly handle content in the new post-Panda world . The search engines are addressing this and giving webmasters and SEO’s more tools and information to organize their websites correctly. After some of the presentations, it seems Google is very dedicated to their Plus and +1 initiatives which will have a large affect on SEO should end user usage continue to increase.

Below are tidbits and takeaways from the conference, from an SEO perspective. Enjoy!


Schema.org, Rel=Author & Meta Tagging For 2012

Panelists:
Janet Driscoll Miller, Search Mojo http://twitter.com/#!/janetdmiller
Topher Kohan, CNN https://twitter.com/#!/Topheratl
Jack Menzel, Google http://twitter.com/#!/jackm

Microformats where the original snippet format, however, they have been replaced by the new and evolving standard, microdata (which is Schema.org/Google/Bing are developing for and placing resources towards). Some notes from the presentations:

  • General consensus is rich snippets can greatly help in getting your content noticed.
  • In one example given, Eatocracy added the hRecipe tag to their pages, and immediately saw a 47% increase in their recipes being picked up and indexed into Google (which does support this in their recipe search). Additionally, they saw a 22% increase in their recipe traffic.
  • CNN started using Yahoo SearchMonkey / RDFa, and saw a 35% increase in their video content on Google Video search, and saw a 22% increase in overall search traffic. However, they removed the additional code from their site as it increased their page load time. The take away on that is that you should think to integrate this into your down dev cycle, your cms, or your template.
  • Per Google, their studies show that sites w/ rich snippets have a better CTR as well. Rich Snippets Engineer at Google, RV Guha noted, “From our experiments, it seemed that giving the user a better idea of what to expect on the page increases the click-through rate on the search results. So if the webmasters do this, it’s really good for them. They get more traffic. It’s good for users because they have a better idea of what to expect on the page. And, overall, it’s good for the web.”
  • Rich snippets only work for one site (no cross site references).
  • Sites like LinkedIn and Google Profiles still use microformats. Google has also provided a tool in WMT, but it is a bit buggy and may throw false errors. If you don’t see your snippets show up in the SERP’s, it’s likely caused by longer than preferred latency load times, errors in your code, or a random Google bug – (per Google).
  • The current types of rich snippets: reviews, people, products, businesses & organizations, recipes*, events, music

Session – “Ask the Search Engines”

Panelists:
Tiffany Oberio – Google http://twitter.com/#!/tiffanyoberoi
Duane Forrester – Bing http://twitter.com/#!/duaneforrester
Rich Skrenta – Blekko http://twitter.com/#!/skrenta

  • One audience member asked how to handle ‘subcategory’ pages that are often created in ecommerce sites such as “Sort Prices $0-$5”, “Prices $5-$25” etc. The question was whether or not to use the “rel=canonical” tag and point the pages back to the main page. The panelists agreed that those pages should be blocked completely and should not use the canonical tag. The Google representative said not only do these pages not add value to the engine’s index, but they also eat up the sites crawl budget.
  • If you see the warning “we’re seeing a high # of URL’s” in Webmaster Tools, most times its a duplicate content issue.
  • One audience member asked: do you look at subdomain as part of the main domain?
    • Blekko – no inheritance from main domain
    • Google – “it depends”. Sometimes it is inherited, sometimes not.
    • Bing – we look and try to determine if subdomain is a standalone business/website and will get treated differently based on that determination
  • One question touched on removing URL’s from Google’s index. Google advised that a removed URL may or may not stay in the index for a period of time, and that to expedite removal of a URL one should use Webmaster Tools remove-url tool
  • Duane from Bing was adamant about keeping your submitted sitemap clean. The threshold is 1%. If there are issues in your submitted sitemap >1%, Bing will “lose trust” for your website
  • Panelists advised to make your 404 pages useful to the user
  • It may not be breaking news, but Bing and Google both said unequivocally – duplicate content does hurts you
  • Google commented they are big fans of HTML 5 technology
  • At this point it seems Google will crawl a page if +1 is present, regardless of the robots.txt. This could possibly create issues with trying to not crawl certain pages to avoid dup content. More information found here: http://www.webmasterworld.com/google/4358033.htm
  • Panelists advised to spend a lot of energy “containing urls” on your website and to be thoughtful about which URLs you are getting out there
  • Bing and Google confirmed that “pagerank sculpting” is misunderstood and not effective. For example, if a page has 5 outgoing links and link juice is spread 20% to each of the 5 links, if you no follow one of the links, the link juice distribution will not become 25% to the remaining 4 links. It will remain 4 x 20%. In essence, you have just evaporated potential link juice

Google Plus and +1

These were hot topics at this year’s SMX East. Multiple session covered Google Plus and +1 in depth.

  • Speaker Benjamin Vigneron from Esearchvision covered the basics of Google Plus and +1 . He noted a +1 to a search result will +1 the ppc ad/landing page, too.
  • With PPC, +1 could have a significant affect on Adrank by affecting each of the Quality Score factors including quality of the landing page, CTR, and the ad’s past performance.
  • Interesting that Adwords could conceivably add segmenting on all information in Google Plus (similar to FB) ie males, ages, etc.
  • Christian Oestlien, the Google Product Manager for Google Plus, spoke about Google Plus features and fielded questions. He mentioned Google is testing and experimenting with celebrity endorsements +1′ing and showed an example SERP with a +1 annotation under the search result (for example “Kim Kardashian has +1’ed” Brand X or search result X). He noted Google is seeing much higher CTR with the +1 annotation and that usage for the “Circles” feature is relatively high.
  • Google software engineer Tiffany Oberoi was also present on the panel. She noted +1 is NOT a ranking factor, but social search is still of course implemented in search results. She confirmend Facebook likes have no impact on rankings but also noted regarding social signals, “explicit user feedback is like gold for us”. She also touched on spam with +1 and said she is currently working with spam team. Regarding +1’s and spamming, she said to think of +1’s similarly to links. The same guidelines could apply. Google wants to use them as a real signal. Using in an unnatural way will not good for you.

Hardcore Local Search Tactics

Panelists:
Matt McGee – Search Engine Land
Mike Ramsey, – Nifty Marketing
Will Scott – Search Influence

Panelists here gave an encore presentation of the session these folks put on at SMX Advanced in Seattle. The content was excellent and definitely deserved another run through. Here are the notes:

  • July 21st, Google removed citations from their Places listings. While they have been removed for public viewing, they are still used. Sources like Whitespark (link: http://www.whitespark.ca/) can be very helpful in uncovering citation building opportunities.
  • Citation accuracy is among the most important factors in getting your business to rank in the O or 7-Pack. Doing a custom Google search of “business name”+”address”+”phone number” will help determine what other sources Google sees as citation sources.
  • Average number of IYP reviews of ranked listings vs non ranked listings showed to be a large gap, indicating that IYP reviews do in fact provide quite a bit of listing weight.
  • Offsite Citation’s / Data appear to be the no. 1 ranking factor in Places listings
  • Linking Root Domains appear to be the no. 2 ranking factor in Places listings
  • Exact match anchor links appear to be the no. 3 ranking factor in Places listings
  • Links are the new citations for local in 2011-12
  • Building a custom landing page to link your Places Listing to appears to be a huge success factor. Include your Name, Address, Phone (NAP) in the title tag
    • Design that landing page to mirror a Places listing on their site w/ a map, business hours, contact data, etc.
    • If needed, submit your contact/location page as your Places URL/Landing Page which will create a stronger geo scent
  • When trying to understand how users are searching for your client, Insights for Search is a great tool as you can find Geo targeted data w/ KW differentiation (ie Lawyer vs Attorney, which is used more in that area)
  • Local requires a different mindset from traditional SEO
    • Optimize location (local SEO) vs Optimize websites (traditional SEO)
    • Blended search is about matching them up
  • PageRank of Places URL does NOT seem to affect Local ranking -(source: David Mihm)
  • Multi-Location Tips
    • Flat site architecture beginning w/ a “Store Locator” page
      • Great Example, lakeland.co.uk/StoreLocator.action
    • Give each location its own page
      • Great Example, lakeland.co.uk/stores/aberdeen
    • Cross link nearby locations w/ geo anchor text
  • Ensure the use of KML Sitemap in Google WMT
  • Encourage Community Edits – Make Use of Google’s Map Maker
  • Include Geo data in Facebook pages and article engines

Panda Recovery Case Study – High Gear Media

Speaker Matt Heist from High Gear Media covered their experiences over the past 8 months with recovering from Panda. High Gear Media is an online publisher of auto news and reviews.

Heist walked through the company’s strategy pre-panda and explained their contrasting new post-panda strategy. The original strategy was many auto review niche sites across a broad range of auto makes, models and manufacturers. The company originally had 107 sites and 20+ writers and dispersed content amongst all the sites. The content was “splashing” everywhere, unfocused. The “large network of microsites” strategy was working and traffic was climbing each month. Then Panda hit – hard. Traffic plummeted beginning this past Spring. Leaders at High Gear was forced to reevaluate their strategy and concluded that a more focused approach was better for users and consequently would help search traffic recover.

High Gear took the following actions:

  • Eliminated most of their properties completely (301′ed) and pared them down to 7 total sites with 4 being ‘core’: FamilyCarGuide, Motorauthority, GreenCarReports, TheCarConnection.
  • Properly canonicalized duplicate content
  • Aggregated content with strong user engagement was KEPT, but not indexed
  • The made the hard decision to eliminate content that could be making money but not good for the long term
  • Dedicated significant resources to redesigning each of the 7 sites remaining sites

Their strategy seems to be working. Heist noted traffic has ‘flipped, plus some”. According to Heist, here are the learning’s:

  • High Gear Media believes that premium content will prevail and that Panda will help that
  • Advertisers like bigger brands – it is now easier to sell ads and for more $ with fewer, more powerful sites
  • With evolution of Social (joining Search from a distribution perspective), premium content that is authoritative AND fresh with flourish

Raven Tools

We were able to meet up with the friendly staff over at Raven Tools, sit down with them, and learn a bit more about their product. We personally have been using Raven for about a year now, and highly recommend it. There are several features in the works that will make this even more of an incredible product. If you haven’t used them, we would HIGHLY suggest giving the tools a run. They are partnering with new companies constantly, and as such, are building out a best in class seo management product.

Upcoming Features:

  • A new feature they are working on is a Chrome Toolbar to compliment the current Firefox toolbar
  • Another feature coming is “templated messaging” for link requests and manual link building which will include BCC’s back to records. Templated Messaging will be built into our Contact Manager, but they are working on making that functionality available in the toolbar.
  • Another upcoming features is file management. RavenTools engineers are looking at integrating Dropbox into the system to allow files to be associated with other data and records.
  • The Co-Founder Jon Henshaw alluded many times to the idea that link building and consequently their toolset will continue to become more and more based on relationships in the future. He also alluded to the idea that traffic can or in some cases should be associated with PEOPLE as the referrer, rather than a website (ie x amount of traffic came from person A, whether it be their facebook, twitter, blog, or website). In other words, a relationship management system looks to be a integral part of the future of Raventools.
  • For future updates, Raventools takes explicit user feedback greatly into account. If you have a feature request or a software integration request, please contact: http://raventools.com/feature-requests/
  • Regarding MajesticSEO and OSE/Linkscape, they will be more fully integrating it into the Research section of Raven. That means they’ll be adding as much functionality into Raven as their APIs will allow. In addition to getting more full access to that data, users will be able to easily add that data to other tools, like the Keyword and Competitor Managers, Rank Tracker, etc…
  • Speed is the number one priority right now. They have full-time staff that are solely dedicated to speeding up the system. The goal is to make it run as fast as a desktop app.
  • Long term – 3rd party integration will be a constant (and should accelerate) for the platform for the foreseeable future.
  • Screenshot of “Social Stream” prototype design http://cl.ly/1b2h0u3P3U441w000o1K/o
  • AdWords Insights: Flagged Pages: http://cloud.raven.im/9v8d
  • Link Clips link checker results with historical results: http://cloud.raven.im/9zgK/o

Other Notes

  • Regarding Panda, one panelist referenced what he called a website’s “Content Performance Ratio” referring to the % of content on a site that is good versus bad or ‘performing vs non performing’ and using that as a gauge as to the health of a website.
  • Panelists also noted in his experience it takes 3-4 requests on a 404 before search engine believes you and removes it from the index.
  • Panelists in the “Ask the SEO” session said to pay close attention to anchor text diversity and human engagement signals

Author bio:
Jake Puhl is the Co-Founder/Co-Owner of Firegang Digital Marketing, a Local search marketing company, specializing in all aspects “Local”, including custom web design, SEO, Google Places, and local PPC advertising. Jake has personally consulted businesses from Hawaii to New York and everywhere in-between. Jake can be contacted at jacobpuhl at firegang.com.

Categories: 

SEO Book.com