Archive for the 'SEO' Category

Matt Cutts #6: All About Supplemental Results

Here’s the sixth in the series of videos posted by Google’s Matt Cutts to Google Video over the past year. These are important for every web developer to see. Please see Matt’s disclaimer first!

See the rest of the videos!

Transcription

OK. We got some supplemental results questions.

David writes in. He says:

”Matt, should I be worried about this? site:tableandhome.com returns 10000 results site:tableandhome.com -intitle:by returns 100000 results. All supplemental.”

David, no in general I wouldn’t worry about this. I want to explain the concept of beaten path. So, if there is a problem with like a one word search in Google, that’s a big deal. If it is like a 20 word search, that’s obviously less of a big deal, because its off the beaten path.

The supplementary results team takes reports very seriously and acts very quickly on them. But in general, something in supplementary results is a little further off the beaten path than our main web results. And once you strat getting into negation, or negation by a special operator like ‘intitle’ stuff like that, that’s pretty far off the beaten path. And you are talking about results estimates, not the actual web results but the estimates for the number of results.

The good news is, there are a couple of things that will make our site:estimates more accurate. There are atleast two changes I know of in our infrastructure, one deliberately trying to make site: results more accurate. The other one is just a change in our infrastructure to improve over all quality but as a side benefit, it counts the number of results from a site more accurately when it involves the supplemental results. So there are atleast a couple of changes that might make results more accurate.

But in general, once you really start to get far off the beaten path, -intitle, all that sort of stuff, especially with supplementary results, I wouldn’t worry that much about the results estimates. Historically we have not worried that much, just because not that many people have been interested. But we do hear more people sort of saying, ‘yes I am curious about this”. So we are putting a little more effort into that.

Lets see. Erin writes in. He says:

“I have a question about redirects. I have one or more pages that have moved on various websites, I use classic ASP” and then he has given the response of 301. He says, “These redirects have been setup for quite a while, and when I run a spider on them, it handles the redirects fine”.

This is probably an instance where you are seeing this happen in the supplemental results. So here is how I think about things: there is a main web results Googlebot and a supplemental results Googlebot. And so, the next time supplemental results Googlebot visits that page, and sees the 301, it will index it accordingly and refresh and things will go fine. Historically, the supplemental results have been a lot of extra data but have not been refreshed as fast as the main web results. And if you do a cached page, you know, anybody can verify that the results on the crawl dates vary.

So, the good news is that the supplemental results are getting fresher and fresher and there is an effort underway to make them quite fresh.

For example, Chris writes:

“I would like to know more about the supplemental index. It seems while you wree on vacation, many sites got put there and I have one page where this happened pagerank of 6, since like May.”

So, I talked about the fact that there is new infrastructure in our supplemental resuts. I mentioned that on a blog post, I don’t know how many people noticed it, but I’ve certainly said that before. I think it was in the indexing timeline in fact. So as we refresh our supplemental results and start to use new indexing infrastructure, in the supplemental results, the net effect is things will be a little fresher, I wouldn’t be surprised and I am sure I have some urls in the supplemental results. I wouldn’t worry about it that much. And over the course of summer, the supplemental results team will take all the reports they see, especially off the beaten path like site: and operators that are kind of esoteric and they will be working on making sure that those return the sort of results everybody naturally expects.

So, stay tuned on supplemental results. It’s already a lot fresher and lot more comprehensive than it was and I think its just going to keep improving.

Transcription thanks to Peter T. Davis

Matt Cutts #5: How to structure a site?

Here’s the fifth in the series of videos posted by Google’s Matt Cutts to Google Video over the past year. These are important for every web developer to see. Please see Matt’s disclaimer first!

See the rest of the videos!

Transcription

OK. As you can see I’ve got the closest thing I could get to a worldmap… Did you know that there are over 5000 languages spoken across the earth. How many does Google support? Only about a hundred. Yeah. Still ways to go.

Alright! Lets do some more questions.

Todd writes in. He says:

“Matt, I have a question. One of my clients is going to acquire a domain name, very related to their business and has a lot of links going to it”. So, he basically wants to do 301 redirect to the final website after the acquisition. The question is, “Will Google ban or apply penalty for doing this 301 redirect?”

In general, probably not. You should be OK, because you specified that it is very closely related. Any time there is an actual merger of two businesses or two domains that are very very close to each other, doing a 301 should be no problem what so ever. if however, you are like a music site and all of a sudden you are acquiring links from Debt Consolidation online, or Cheap yada yada yada, that could raise a few eyebrows. But it sounds like this is just a run of the mill thing, I think you should be OK.

Barry writes in:

”What’s the best way to theme a site using directories. Do you put the main keyword in the directory or on the index page? if using directories, do you use a directory for each set of key words?”

This is a good question.

I think you are thinking too much about keywords and not enough about your site architecture. So, this is just for me. But I prefer tree-like architecture. So, every thing branches out and nice sort of even bounds. Its also good if things are broken down by topic. So, you know, if you are selling clothes, you want to have sweaters as one directory, shoes as another directory or something like that.

If you do that sort of thing, what you end up with is, the keywords end up in directories. And as far as directories versus the actual name of the html file, it doesn’t really matter that much within Google’s scoring algorithm. I think if you break it down by topic, but make sure that those topics match well with the keywords you expect your users to type in, when they try to find your page, then you should be in pretty good shape.

Alright! Jody writes in:

“If a e-commerce site’s url has too many parameters”, so, she’s got like the punctuation monster barfing over the number of parameters, “and it is unindexable, is it acceptable to use the Google guidelines to server static html pages to the bot to index instead.”

This is something to be very careful about, because if you are not you can end up being into an area that is known as cloaking. Again, cloaking is showing different content to users than to Google bot. And you want to show the exact same content to the users as you do to the Google bot. So my advice would be to go back to the question I answered a while ago about dynamic parameters and urls and to basically see if there is a way to unify it, so that the users and Google both see the same directory. If you do something like that, that’s going to be much better. Failing that you want to make sure that what ever html pages you do show, if users go to the same page, they don’t get redirected, they don’t go somewhere else. They need to see the exact same page that Googlebot saw. That’s the main criteria of cloaking and that’s what you want to be careful.

John Wooley writes in. He says:

“I would like to use A/B split test on one of my static html site, will Google understand my php-redirect for what it is, or will they penalize my site for perceived cloaking? If this is a problem, is there a better way to split test?”

That’s a good question.

If you can, I would split test in an area where search engines aren’t going to index it. Because, anytime we go to a page, and we see different content, or if you re-load and you see different content, that does look a little bit strange. So if you can, its better to use, robots.txt or .htaccess files or something to make sure that Googlebot doesn’t index your A/B testing. Failing that what I would do is, I wouldn’t use php-redirect, I would try to use something server-side to actually serve up the two pages in place. The one thing to be careful about and I touched on this a while ago earlier in another session was, you should not do anything special for Googlebot. Just treat it like a regular user. That’s going to be the safest thing, in terms of not being treated like cloaking.

And, lets wrap up! Todd asks another question.

“Aw heck. How about a real question. Ginger or Mary Ann?”

Ah, ha ha, I am going to go Mary Ann (nodding his head).

Alright. That’s enough for another session.

Transcription thanks to Peter T. Davis

Matt Cutts #4: Static vs. Dynamic URLs

Here’s the fourth in the series of videos posted by Google’s Matt Cutts to Google Video over the past year. These are important for every web developer to see. Please see Matt’s disclaimer first!

See the rest of the videos!

Transcription

Alright, here we go again! I am learning something every time I do one of these. For example, it is probably smart to mention that today is Sunday, July 30th, 2006.

Alright! Jeremy writes in:

He says, “Does Google shrink dynamic pages differently than static pages? My company writes Perl and pages are dynamically created using arguments in the URLs yada yada”.

Good Question.

To a first approximation, we do treat static and dynamic pages in similar ways in ranking. So, let me explain that in a little more detail. Pagerank flows to dynamic urls in the same way it flows to static urls. And so, if you’ve got New York Times linking to a dynamic url, it will still get the pagerank benefit and it will flow the pagerank benefit.

There are other search engines who in the past which have said, “OK, we go one level deep from the static urls. So we are not going to crawl from a dynamic url, but we are willing to go into the dynamic url space from a static url.”

So, the short answer is pagerank flows just the same between static and dynamic urls.

Lets go into the more detailed answer. The example you gave actually has five parameters and one of them is like a product id with like 2725… You definitely can use too many parameters. I would absolutely opt for two or three at the most, if you have any choice what so ever and try to avoid long numbers because we can think of them as session ids. Any extra parameters that you can get rid of are always a good idea. And remember, Google is not the only search engine out there. So if you have the ability to basically say, I am going to use a little bit of mod-rewrite and I am going to make it look like a static url, that can often be a very good way to tackle the problem.

So, pagerank still flows but, experiment. If you don’t see any urls that have the same structure or with the same number of parameters, as you are thinking about doing, then its probably better, if you can either cut back on the number of parameters or shorten them some how, or try to use mod-rewrite.

Alright. Mark writes in. This is an interesting question. He had a friend whose site was hacked and did not know about it for a couple of months because Google had taken it out or something like that. So he asks:

“Can Google inform the webmaster, of this occurrence, basically when your site gets hacked, within sitemaps? Inform them that may be inappropriate pages were crawled.”

That’s a really good question!

My guess is, we don’t have the resources to do something like that right now. In general, when somebody is hacked, if they have a small number of sites they are monitoring, they usually notice it pretty quickly or else the webhost will alert them to it. So, the sitemaps team is always willing to work on new things, but my guess is this would be at the lower end of the priority list.

OK. James M. writes in.

He says, “Matt, in the fullness of sometime, I would like to use geotargeting software, to deliver different marketing messages to different people in different parts of the world. So for example, discounted pricing structure. Are we safe to run with this plain vanilla use of geotargeting software? Clearly, we want to avoid any suspicions of cloaking.”

That’s a really neat question!

Lets talk about it a little bit. The way that Google defines cloacking is very specific. It says, “showing different content to users than you show to search engines.” Now geotargeting by itself is not cloaking under Google’s guidelines, because all you are doing is, you are saying, “take the IP address, oh you are from Canada, we will show you this particular page”. Or, “take the IP address, you are from Germany, so we will show you this particular page”.

The thing that will get you in trouble is if you treat Google bot in some special way. So, if you are geotargeting by country, don’t make a special country just for Google bot - GoogleBotiStan or anything like that. Instead what you want to do is to treat Google bot just like a regular user. So if you geotarget by country, we are coming from an IP address that is in United States, so just give Google bot what ever the United States users would see.

So, Google for example does geotargeting. We don’t consider that cloaking. I think I’ve explained the subleties pretty well. But, again, cloaking is showing different content to users than to search engines. In this case, you should treat the Google bot like you would treat any other user based on the fact that they’ve got this IP address and you should be totally fine.

Alright! Lets take another break.

Transcription thanks to Peter T. Davis

Matt Cutts #3: Optimize for Search Engines or for Users?

Here’s the third in the series of videos posted by Google’s Matt Cutts to Google Video over the past year. These are important for every web developer to see. Please see Matt’s disclaimer first!

See the rest of the videos!

Transcription

Alright! Lets try a few more.

By the way, I showed my disclaimer to somebody from the Google Video team and he said “Matt, it looks like you have been kidnapped”. So, may be I have to get some sort of rocket boom world map or something back there (pointing to the wall behind). I don’t know, you guys care more about the information than, you know, how pretty it is I’m guessing.

Alright! Todder writes in.

“My simple question is this. Which do you find more important in developing and maintaining a website, ’search engine optimization’ or ‘end user optimization’?” and then he says, “I will hang-up and listen”.

Todder, that’s a great question!

Both are very important and I think, if you don’t have both, you don’t do as well, because if you don’t have search engine optimization, its harder to be found and if you don’t have end user optimization, you don’t get conversion. You don’t get people to stay and really enjoy your site and post on your forum or buy your products or do anything else.

So, I think you do need both. The trick in my mind is to try to see the world such that they are the same thing. You want to make it so that you are user’s interests and the search engine’s interests are as aligned as you can. And if you can do that then you will usually be in very good shape, because you will have compelling content, reasons why people want to visit your site, it will be very easy for users to get around and for search engines to get around. And you won’t be doing any weird tricks, anything that you do that’s good for search engines, you will also be showing to the users. So, I think you have to balance both of them.

Ted Z writes in with a couple of interesting questions.

“Can you point us to some spam detection tools. I would like to monitor my sites to make sure that they come-up clean and have a valid way to rat out my no-good spamming competitors.”

Well, if you are sure they are spamming, some tools you can use, first off in Google we have a lot of tools to detect and flag spam. But most of them are not available outside of Google. One thing you could look at is Yahoo site explorer, which shows you backlinks on a per page or a per domain basis I think. That can be pretty handy. There are also tools out there to show you every thing on one IP address. Now, if you are going to share a virtual host, you will get a ton of perfectly normal sites. But, sometimes, somebody might leave a lot of their sites on one IP Address and if you want to spam, you could find more sites that way. You just have to be careful that you don’t automatically assume they all belong to one person. As far as checking your site to make sure that it comes out clean, I would certainly hint sitemaps or webmaster’s console that will tell you of any crawl errors and other problems that we found.

And then the second question that TedZ asks I think is very good is,

“What about the cleanliness of the code, for example W3C? Any chance that the accessible work will leak into the main algorithm?”

People have been asking me this for a long time and my typical answer is, normal people write code with errors. It just happens. Eric Brewer, one of the cofounders of Inktomi, has basically said, 40% of all HTML pages have syntax errors. And there is no way that a search engine can remove 40% of its content from the index, just because somebody didn’t validate or something like that.

So I think there is a lot of content, especially content that is man made, students on .edus and things like that, that’s very high quality but probably doesn’t validate. So, if you had asked me a while ago, I would have said, yes, we don’t have a signal like that in our algorithms and its probably for a good reason.

That said, now that T.V. Ramen has done this work to do the accessible search, you know somebody might look at using in a possible signal. Any signal we use to improve quality will have to pass through rigorous evaluation and stuff like that.

In general, its a great idea to go ahead and have your site validated. But I wouldn’t put it at the top of your list. I would put making compelling content, making a great site at the top of your list. And then once you’ve got that, you want to go back and dot your ‘i’s and cross your ‘t’s and make sure that you got good accessibility as well. Well, you always want to have good accessibility. But validation and sort of closing off those last little things usually doesn’t matter that much to search engines.

Lets go ahead and pause here.

Transcription thanks to Peter T. Davis

Related Link: Organic search engine optimization - Page1Promotion offers professional SEO services including keyword research, website analysis, on-site optimization, link building, & web analytics.

Matt Cutts #2: Some SEO Myths

Here’s the second in the series of videos posted by Google’s Matt Cutts to Google Video over the past year. These are important for every web developer to see. Please see Matt’s disclaimer first!

See the rest of the videos!

Transcription

Alright! Well, I am trying to upload the last take to Google Video, so we will see how it looks. While I am waiting, why don’t I do few more questions and see if we can knock a few out. I am realizing that with this video camera that I’ve got, I can do about 8 minutes worth of video before I get 200 megabytes and then I have to use the client upload or so. I’ll probably break it into chunks of 5 to 8 minutes each.

OK. Ryan writes in.

He says,”Can you put an end to some myths about having too many sites on the same server or having sites with IPs too similar to each other or having them all include the same java script of a different site”.

In general, if you are an average webmaster, this is something that I wouldn’t really worry about. Now, I have to tell a story about Tim Myer and I have been on the same panel together and somebody said, “you took all my sites out” and he said, “both Google and Yahoo did. I don’t really have that many”. And so, Tim Myer asked, “Well, how many sites did you have?”. And the guy looked a little sheepish for about a minute and then he said,”Well… I had about 2000 sites”.

So, there is a range, right - there’s continuum. If you’ve got two, three, four or five sites and they are all different themes, stuff like that, you are not in the place, where you really need to worry that much. If you have 2000 sites, you need to be asking yourself, do I really have enough unique value add content to support 2000 sites? Becuase the answer is probably not. But if you are just an average guy, you’ve got a few sites, I wouldn’t worry about them being on the same IP address, I definitely wouldn’t worry about them being on the same server. That’s something that everybody does.

And the last thing Ryan asked was including the same Javascript off a different site. Well this is a very common idiom. People use Javascript trackers. Google Adsense is Javascript included off another site. So this is something that a lot of sites do on the web, I wouldn’t necessarily worry about it at all. Now again, if you have 5000 sites and if you are including the Javascript that does the sneaky redirect, then you do need to worry. But if it is just a few sites or if you are doing something that is entirely logical with your Javascript, I wouldn’t worry at all.

Alright! Erin Shear writes in. Its kind of an interesting question.

He says, “I am having trouble understanding the problems that we face every time we launch a new country. Typically, we launch a new country with millions of new pages at the same time. Additionally, due to our ambitious PR Team, we get tons of link from our network of sites as well as press, during every launch.”

So he is saying that the last time they did this, they didn’t do very well in French and they lost a site in Australia that didn’t do very well at all.

Erin, this is a good question, primarily because the answer has changed somewhat since the last time we talked. Somebody asked me this question at SES Conference New York and I said, ”Just go ahead and launch stuff. Don’t worry about it. It may bring more scrutiny but in general you will be fine.”

I think if you are launching sites with millions of new pages, you want to be a little more cautious, if you can. In general if you are launching with that many pages, its probably better to try to launch a little more softly. So, a few thousand pages and then add a few thousand more… stuff like that.

It could be… Millions of pages is a lot of pages. I mean, Wikipedia is only, how many, 5 or 10 million pages. So if you are launching millions that could be attracting scrutiny and you want to make sure that they are all good pages. Otherwise you might find yourself, not doing as well as you had hoped for.

Alright! Quick question.

Classic Nation writes in and says, “I am wondering what the status is on Google Images and if we can expect to see an update on the indexing technology of the future.”

Actually there was a word on this at the Web Master World. We just did an index update, (just I think last week end) of our index for Google Images. And, I was talking to somebody on the Images team and they are always working hard. So a lot of that stuff you may not see, it may be as simple as bringing in new infrastructure that the main web index has, but they are always working hard to make Google Images index better.

Transcription thanks to Peter T. Davis

Matt Cutts #1: Qualities of a Good Site

Here’s the first in the series of videos posted by Google’s Matt Cutts to Google Video over the past year. These are important for every web developer to see. Please see Matt’s disclaimer first!

See the rest of the videos!

Transcription

OK. lets try a few questions and answers. I don’t know if this will work. So, lets give it a shot. (Picks up the first letter) Ralph writes in. He says,

“Some comments on sitemaps please. It seems updates on sitemaps depend on page views of a site”.

No Ralph. that’s not really true. As far as I understand it, page views are not really a factor on when things are updated in sitemaps. So, there are different pieces of data with in sitemaps. Imagine five different little pieces of data. They can all be updated at different times and at different frequencies. Typically they will be updated, you know, with-in days or worst case, with-in weeks. However, as far as I know, its not dependant on page views.

Lets try another one. (Picks up another letter)

“What are some general guidelines and recommendations you would make to people who desire to increase their site’s visibility on Google.”

Wow! OK. So, this is a meaty topic. Definitely a longer issue. But lets go ahead and dive into it.

So, in general, the number one thing that most people kind of make mistake on, on SEO is, they don’t make their site crawlable.

So you want to look at your site, either through a search engine’s eyes or you know, use a text browser, do something and go back to 1994 and use Links or something like that. If you can get through your entire site using only a text browser, then you are going to be in pretty good shape, because most people don’t even bother about crawlability.

You also want to have things like sitemaps on your site, and you can also use our site maps tool in addition to that.

Once you have got your content and you want to have good content, content that’s interesting you know, a reason why somebody would want to link to you and your site is actually crawlable, then you need to go about marketing, promoting or optimizing your site.

So, the main things that I would advise are, think about the people who are really relevant to your niche and make sure that they know about you.
If you are associated with a doctor, because you have got some medical kind of website, make sure that the doctor knows about you and if he has got a website, it might be appropriate for him to link to you.

You also want to be thinking about a hook, something that’s viable. It can be really good content, for example, newsletters, tutorials, I would set it up all these video stuff, trying to make it look semi-professional.

There were tutorials by a company called Photoflex(??), they were saying here is how to make the fill light, the key light and all that sort of stuff and oh by the way, you can buy our equipment to do that. That’s really really smart and infact, another photography site that I went to, they had syndicated their tutorial lessons to that other (photoflex)website. So, content can be a great way to get links. You can also look at things like digg, Slashdot, Tearrank(?), Reddit, you know, social networking sites, MySpace, those sort of things.

But fundamentally you need something interesting that sets you apart from the pack.

Once you got something like that, then you are going to be in much better shape as far as promoting your site is concerned. But again, the biggest step is making sure that your site is crawlable. After that, making sure that you got content, and then finally trying to do the best you can to find some hook, some reason why users would love your site, return to it and bookmark it.

Alright. Lets do another one! (picks up another letter)

“What conditions”, asks Brian M, ”call Google to use DMOZ snippet when there is already a valid meta description tag on the page?”

That’s a really good question. I actually had to go and ask the snippets team. I was like “Hi, why does this happen?”.

I am not going to go into too much of detail, but here’s the way you should think about it. Suppose that you have a page about Christina Aguilera or something like that and your open directory snippet is something about Britney Spears. Well, if you type in or some user types in Britney Spears, that’s going to be a much better snippet.

So the way I would be thinking about it is that there is always a scoring process which does all this selection to say, OK you are the best document to be returned. Once we have selected and scored your document, so that, you are going to be returned at a certain slot on the search engines.

Now what you need do is to think, is the open directory snippet or my meta tag a better match for what the user actually typed in.
Its actually ‘query dependant’. That is, depending on query the user typed, we say, well, we think that the meta description tag from the open directory project or from your meta tags is going to be a better match for the user’s query. Then based on that, we try to say, OK, in that case lets go with the meta-tags, or in this other case lets go with the open directory project.

Now you can, If you don’t like the open directory snippet, you can use the meta ‘noodp’ tag and that will prevent us from using the description from the open directory project. So you have sort of the ability to scope things a little bit and choose which things you want to have happen(?).

Alright! (Picks up another letter) This one is a good one. Lara McKenzie(??) writes in.

She says, “does Google favor ‘bold’ over ’strong’ tags?” (Sigh).

In general, we probably favor bold just a little bit more, but its so slight that I wouldn’t really worry about it. I would go ahead and do your markup how ever you want to do it, not worrying so much about “oh if I use a tag like this I am going to get a little bit of boost in Google” or something like that. Any kind of effect like that is relatively small. So in general, I do what ever is best for users or whatever is best for you site and then not worry much about it after that.

I think I am going to go ahead and upload what we got so far, see how it looks and hopefully I will be back in a bit.

Transcription thanks to Peter T. Davis

30 Seconds for a Charity?

Tijuana KidsThis afternoon, Aaron from the City of Angels Children’s Home in Tijuana, Mexico contacted me for some SEO advice. It seems that since yesterday, the homepage of the site (just the homepage, not the entire site! http://www.tjkids.org) has been dropped from the Google index.

I’ve taken a look at it, and identified a few possible factors:

  1. Very low link strength - The site has only 17 backlinks according to Yahoo, with only 14 distinct domains between them.
  2. No internal links to the root domain - The menu bar linked to http://www.tjkids.org/index.html rather than to the root page.

If you have a spare minute, here’s some things you can do to help Aaron and the orphanage.

  • Are you an SEO? Do you see anything else on the page that could be causing Google issues? I’d appreciate it if you could post anything else in the comments, so I can relay it back to Aaron!
  • Are you a blogger or webmaster? It would help a lot if you could write a post or story linking back to the City of Angels Children’s Home. This will help them to get indexed again.

Thank you for taking a second to read this, and please consider helping out!

« Previous PageNext Page »