Archive for the 'Web Design' Category

Matt Cutts #11: Reinclusion requests

Here’s the eleventh in the series of videos posted by Google’s Matt Cutts to Google Video over the past year. These are important for every web developer to see. Please see Matt’s disclaimer first!

See the rest of the videos!

Transcription

Hey! This is Matt and Emmy coming to you on Thursday after hockey at the GooglePlex. Lets talk about, I don’t know, reinclusion requests.

So, I did a blog post about reinclusion requests a while ago. The procedure has changed a little bit though. So, imagine if you spammed or someone that you hired as a web master has spammed and now you are no longer in Google. What do you do now?

So the best thing that I recommend, is to register in sitemaps, or webmaster console or webmaster central whatever you want to call it. And, its basically the place where you can get all kinds of information. Sometimes, you can even find out if you have penalties on your site. We can’t show all your penalties that we have because, that would clue malicious spammers as well. But if there are real legit sites, that have valid content, we want them to able to be found. So we can show penalties for some sites.

So, if you do have a penalty or if you suspect that you might have a penalty, go ahead and register at sitemaps and then fill out a reinclusion request. I thinks it is like at the bottom left or something like that. And, the more information you can give, the better.

So, for example, if you are using an SEO or somebody that your webhost got hacked or whatever, give us as much specifics as you can. You also want to try to give some sort of timeline, here is what was going on, here is the mistake we had made. The most important thing is, Google needs to know that it’s not going to happen again.

So, some ways of letting us know or convincing us that, what ever you think the problem was, usually you might have a pretty clear idea, something like a hidden text, doorway pages, sneaky re-directs using Javascript, anything like that. We need to know that those pages, those violations of our quality guidelines are not going to comeback.

So that’s the procedure that I would go with. Try to include as much detail as possible about how it might have happened and what you are going do to make sure that it does not happen again. And then, that goes into a queue which we check and we try to find out, OK, has the hidden text been removed, stuff like that. So, reinclusion requests definitely get looked at by people and that’s the procedure I would recommend to use to put one in.

Transcription thanks to Peter T. Davis

Matt Cutts #10: Lightning Round!

Here’s the tenth in the series of videos posted by Google’s Matt Cutts to Google Video over the past year. These are important for every web developer to see. Please see Matt’s disclaimer first!

See the rest of the videos!

Transcription

Alright. This is Matt Cutts, coming to you on July 31st Monday. This is probably the last one I will do tonight. So lets try to do a lightning round.

Alright! Peter writes in. Says:

“Is it possible to search for just home pages? I tried doing -inurlhtml, -inurlhtm blah, blah blah.. php, asp, but that doesn’t filter out enough.”

That’s a really good sugestion Peter. I hadn’t thought about that.

Fast used to offer something like that. But I think, all they did was to look for a ~ in the url. I will file that as a feature request and see if people are willing to prioritize it where we might be able to offer that. My guess is, it would be relatively low on the priority list, because of the syntax you mentioned subtracting off a bunch of extensions would probably work pretty well.

Ah. I get to clarify something about strong versus bold, emphasis versus italic. So, there was a previous question where somebody had asked about whether it was better to use bold or whether it was better to use strong. Because bold is what everybody used in the old days when the dinosaurs roamed the earth, and strong is what the W3C recommends. At that time, last night, I thought that we just barely, barely, barely, like an epsilon preferred bold over strong and I said, for the most part don’t worry about that.

The nice thing is an engineer actually took me to the code where actually I could see it for myself, and Google does treat bold and strong with exactly the same weight. So thank you for that Paul. I really, really appreciate it. In addition, I checked the code that shows that ‘em’ and italic are treated exactly the same as well. So, there you have it, go forth and mark up like the W3 would like you to do it, do you it semantically well and don’t worry so much about crufty old tags, because Google would score it just the same either way.

Alright. In the lightning round, GoodmanAmanaHVAC asks,:

“Will we see more kitty-posts in the future?”

I think we will. In fact I tried to get my cats in on this show but they are a li’l scared of lights. Lets see, if I can get them used to it.

TomHTML asks,:

“What are Google SSD, Google GAS, Google RS2, Google Global Marketplace, Google Weaver and other services discovered by Tony Rusco??”

I think it was very clever of Tony to try to do a dictionary tag against our services check-in, but I am not going to talk about what those services are.

What else have we got here.

Josef Humpkins asks,

“A Preview of what many of the topics might be in the duplicate content session of the SES.”

I gave a little bit of a preview in one of the other sessions on video. But, I think what we would basically talk about, Sherry will be there, a lot of people will be there, we will talk about shingling.

What I’ll essentially say is, Google does a lot of duplicate detection from the crawl, all the way down to the very last millisecond, practically when user sees things. And we use stuff that’s exact duplicate detection and we do stuff that’s near duplicate detection. So we do a pretty good job all the way along the line of trying to weed out duplicates and stuff like that. And the best advice I give is to make sure that your duplicate content, you know, pages which might have nearly same content, look as much different as possible, if they are truly different content.

A Lot of people worry about printable versions or somebody else asked about .doc or word file compared to an html file. Typically you don’t need to worry about that. If you have similar content on different domains, may be in French and another version in English, you really don’t need to worry about that.

Again, if you do have the exact same content, may be for a Canadian site and for a .com site, its probably just the sort of thing where we will detect which ever one looks better to us and and just show that, but it wouldn’t necessarily trigger any sort of penalty or anything like that. Or if you want to avoid it, you can try to make sure that templates are very very different. But in general, if the content is quite similar, its better just to let us show which ever representation we think is the best anyway.

And Thomas writes in and says,

“Does Google index or rank blog sites differently, than regular websites?”

That’s a Good Question.

Not really. Somebody else asked about links from govs, edus and whether links from two level deep govs and edus, like gov.pl are the same as .gov. And the fact is we don’t really have much in the way to say, oh this is a link the from the odp or from .gov or .edu.so give that some sort of special boost. Its just that those sites tend to have higher pagerank because more people link to them and reputable people link to them.

So blog sites,there is not really any distinction unless if you go off to blogsearch ofcourse, and then its all constrained to blogs. In theory, we could rank them differently, but for the most part, just the general search, the way it crawls out. Things are working out ok.

Alright!. Thanks.

Transcription thanks to Peter T. Davis

Matt Cutts #9: All about datacenters

Here’s the ninth in the series of videos posted by Google’s Matt Cutts to Google Video over the past year. These are important for every web developer to see. Please see Matt’s disclaimer first!

See the rest of the videos!

Transcription

OK! This is Matt Cutts, coming to you live from the Mattplex. Its Monday, July 31st. And I am wearing a different shirt, so its not all one big take. In fact, it my werewolf versus unicorn shirt. That’s right, you’ve got the unicorn and the werewolves. Mortal enemies since the beginning of time and “Its On Now”.

Alright! So, this should better be a special session. Lets take a fun question from g1smd. They ask,

“For all the datacenter watchers out there. Should all results across one class C IP address block be the same most of the time, except when you are pushing data or they are supposed to be different because you are trying different things on them? And, would make more sense to use the direct ip addresses when reporting issues or problems, or the 41gfe datacenter names?”

Alright. Well! Lets talk about datacenters.

Back in the days of dinosaurs, you know, when the dinosaurs roamed the earth, you could actually run a search engine off of one computer. And those days are long since gone unless you have a really, really, powerful compute,r or something very, very, small to search over, or you have Google Search appliance, I guess. So, these days you pretty much have to have a datacenter. And in the early days of datacenter you could just do, you know, some sort of round robin trick with dns, so that you always hit different datacenters. Google does some very smart stuff in load balancing, some very interesting techniques to try to make sure that different datacenters are able perform well.

So your basic question was this. Should all things on the same Class C IP block be roughly the same. And yes, they should roughly be the same in that they are typically the same datacenter. But not always. Let me give you a couple of examples.

If one datacenter has to fail over or if one datacenter is out of rotation, then even if you are going to one IP address, you can get bounced over to a different datacenter. And even though it will look like you are consistently hitting the same datacenter, behind the scenes, underneath Google’s load balancing, you could be hitting a different datacenter completely. So, those situations are somewhat rare but not that rare. So that’s why sometimes when you see people having debates online at WebmasterWorld or Datacenterwatcher and stuff like that, they can actually be seeing different things, even if they hit the same IP address.

The other point I wanted to make, and I made this at Pubcon, Boston, was that, the datacenters often have a lot of different things going on. So whenever there is a new algorithm update or some other feature that we are trying out, we often try it out on one datacenter first, to make sure the quality is what we have expected it to be based on evaluation, stuff like that.

So the datacenters do differ, you know, according to very some complex intricate plans, so that we can try out different things at different datacenters. Typically, on one class C IP address, you will usually hit the same datacenter, but that’s not guaranteed. Also, at Pubcon Boston, I showed a list of, an example of the sorts of different things that are going on at different datacenters. It sort of shows how things a lot more intricate now than they use to be and so, Google does a lot more smart scheduling and its a lot harder for a random person to just look at a datacenter and reverse engineer or try to guess you know, which way things are going, stuff like that.

As far as IP address versus the GFE name, which I think exactly me and g1smd know about, no one else really bothered to talk about, except may be on WebmasterWorld, you can use either IP address, or you know the two letter code of a datacenter, because we are able to map them both back. If you tell us one, we can tell what the other one is, ether way.

In general though, there are probably better ways to spend your time, than watching datacenters. I think its a good use of your time to work on your content, a good use of your time whenever something major is going on if you really want to look whenever there is a pagerank update or something going on. But, in general, there is enough stuff going on at different data centers, that I would say it’s probably not worth checking every single datacenter, every single day to try to figure out, ‘OK, how am I going to do or how have I been doing’. Its probably better to spend a little more time paying attention to your logs and work backwards based out of that.

Transcription thanks to Peter T. Davis

Matt Cutts #8: Google Terminology

Here’s the seventh in the series of videos posted by Google’s Matt Cutts to Google Video over the past year. These are important for every web developer to see. Please see Matt’s disclaimer first!

See the rest of the videos!

Transcription

OK. We’re back. I want to start off with a really interesting question. Dazzling Daonna wrote all the way from Louisiana. She says,

“Matt! I mentioned before that I love to see a define type post, redefine terms that you Googlers use, that we non-Googlers might get confused about. Things like Data Refresh, Orthogonal etc.. You may have defined them in various places. But one cheat-sheet kind of list would be great.”

A very good question!

So, at some point I’ll have to do a blog post about host versus domain and a bunch of stuff like that. But several people have been asking questions about June27th, July 27th. So, let me talk about those a little bit, in the context of a data refresh versus an algorithm update versus an index update.

So, I’ll the use metaphor of the car. Back in 2003, we would crawl the web and index the web about once every month. And when we did that, that was called an index update. Algorithms could change, the data would change, every thing could change all in one shot. So, that was a pretty big deal. Webmaster world would name those as “index updates”. Now that we pretty much crawl and refresh some of our index every single day, it’s ever-flux, always going on sort of process.

The biggest changes that people tend to see are algorithm updates. You don’t see many index updates anymore, because we moved away from this monthly update cycle. The only times you might see them is, if you are computing an index which is incompatible with the old index. So for example, if you change how you do segmentation of CJK, Chinese, Japanese, Korean or something like that, you might have to completely change your index and go to another index in parallel. So the index updates are relatively rare.

Algorithm updates, basically are when you change your algorithm. So, may be that’s changing how you score a particular page, you say to yourself, oh, the page rank matters this much more or this much less and things like that. And those can happen pretty much at any time. So we call that asynchronous, because whenever we did an algorithm update and evaluates positively, it improves quality, it improves relevance, we go and push that out.

And then the smallest change is called a data refresh. And that’s essentially like, you are changing the input to the algorithm, changing the data the algorithm works on.

So, an index update, with the car metaphor would be changing a large section of the car, things like, changing the car entirely, where as in algorithm update would be like changing a part in the car. May be changing out the engine for a different engine or some other large part of the car. A data refresh is more like changing the gas in your car. Every one or two weeks or three weeks, if you are driving a hybrid, you will change what actually goes in and how the algorithm operates on that data.

So for the most part, data refreshes are very common thing. We try to be very careful about how we safety check them. Some data refreshes happen all the time. For example we compute pagerank continually and continuously. So there is always a bank of machines refining pagerank based on incoming data. And page rank goes out all the time, anytime there is an update with our new index, which happens pretty much every day.

By contrast, some algorithms are updated every week, every couple of weeks and so those are data refreshes that happen on a slower pace. So the particular algorithm that people are interested in on June 27th and July 27th, those algorithms, well that particular algorithm is actually been live for over a year and half now. So it’s data refreshes that you seeing the are changing the way people’s sites rank.

In general, if your site has been affected, go back, take a fresh look and see, is there anything that might be exceedingly over optimized, or may be a bit hanging out on SEO forums for such a long time that I need to have a regular person come and take a look at the site and see if it looks ok to me. If you’ve tried all the regular stuff and it still looks ok to you, then I would just keep building regularly good content, and try to make the site very useful and if the site is useful, then Google should, you know, fight hard to make sure that rank is where it should be ranking.

That’s about the most advice I can give about June 27th and July 27th data refreshes, because it does go into our secret sauce a little bit, But that hopefully gives you an idea about the scale, the magnitude of different changes. Algorithm changes happen a little more rarely, but data refreshes are always happening and sometimes they happen from day to day and sometimes they happen from week to week and month to month.

Transcription thanks to Peter T. Davis

Matt Cutts #7: Does Webspam use Google Analytics?

Here’s the seventh in the series of videos posted by Google’s Matt Cutts to Google Video over the past year. These are important for every web developer to see. Please see Matt’s disclaimer first!

See the rest of the videos!

Transcription

Ah. Well, Hello There!
I was just enjoying some delicious ‘Diet Sprite Zero’ , while reading my new issue of ‘Wired’ magazine. Oh, they really captured the asymmetry in Steven Colbert’s ears. Didn’t they?

I don’t know. I think it will be really fun to do fake commercials. Diet Sprite has not paid me anything for endorsing them.

Alright. Shawn Stinez (??) writes in.

“Does Google Analytics play part in SERPs?”. SERPs meaning, Search Engine Results Pages.

To the best of my knowledge, it does not. I am not going to categorically say we don’t use it any where in Google. But, I was asked this question in Webmaster World in Las Vegas last year, and I pledged that Webspam team will not use Google analytics data at all. Now, webspam is just a part of quality and quality is just a part of Google, but Webspam definitely has not used Analytics data to the best of my knowledge. Other places in Google don’t either. Because we want people to just feel comfortable using it and (pause) use it.

Alright. Gwen writes in. She or he says:

“Dear Mr.Cutts, its going to be along weekend, You get a lot of questions asked.” Thank you ma’m very sympathetic of you! “But I have to. When does Google detect duplicate content and within which range will duplicate be duplicate?”.

Good question.

So, that’s not a simple answer. The short answer is, we do a lot of duplicate content detection. It’s not like there is one stage where we say, right here is we detect the duplicates. Rather, it’s all the way from crawl, through the indexing, through the scoring, all the way down until finally just milliseconds before you answer things. And there are different types of duplicate content. There is certainly exact duplicate detection, so, if one page looks exactly same as another page, that could be quite helpful. But at the same time, its not case the pages are not always exactly the same. And so, we do also detect near duplicates. We are using a lot of sophisticated logic to do that. So, in general, if you think you might be having problems, your best guess is probably is to make sure that your pages are quite different from each other. Because we do, do a lot of duplicate detection to crawl less and to provide better results and more diversity.

OK. Jeff Jones(??) writes in. This is my favorite question. Well, there have been a lot of good questions. I really like this one.

“I would like to explicitly exclude a few of my sites from the default moderate safe search filtering. Google seems to be less of a prude than I would like to prefer. Is there any hope of a tag, attribute or other snippet to limit a page to unfiltered results or should I just start putting a few nasty words in the alt tags of blank images.”

Well, don’t do them in blank images. You know, put them in meta-tags. Whenever I was writing the very first version of safe search, I noticed that there were a lot of pages which did not tag their sites or their pages at all, in terms of we are being adult in content. So there are lot of industry groups,there is a lot of industry standards, but at that time, the vast majority of porn pages just sort of ignored these tags. So, its not that big deal, go ahead and include that.

So a short answer to your question is, to the best of my knowledge there is no tag that can just say, “I am porn, please exclude me from your safe search.” Its wonderful that you are asking about that.

Your best bet, I would go with meta-tags. Because safe search, unlike a lot of different stuff, actually does look at the raw content of a page, or at least the version that I last saw looks at the raw content of the page. And so, if you put it in your meta-tags or even in comments, which is something that isn’t usually is not indexed by Google at all, we should be able to detect that it is porn that way. Don’t use blank images. Don’t use images that people can’t see though.

And then lets finish of with a question from Andre Shogan (??). He says:

”Sometimes I make a box spiderable, by just putting links in the option elements, normal browsers ignore them and spiders ignore the option. But since Google is using the Mozilla bot, and the bot renders the page before it crawls it, I know that if the Mozilla engine renders the element who will remove the element from the document object model tree.”

So in essence he is saying, can I put the element in an option box. You can. But I wouldn’t recommend it. it is pretty non-standard behavior. Its very rare.It would definitely make my eyebrows go up, if I were to see it, so its better for your users and better for search engines, if you probably just take those links out, put them somewhere at the bottom of the page or in a sitemap, and then that way, we will be able to crawl right through and we don’t have to have hyperlinks or anything like that.

Alright! that’s enough questions for now. Its getting toward eleven o’ clock. I am going to call it night.
Its Sunday, July 30th. So we will see if we can knock a few of these out next week. Thanks a lot.

Transcription thanks to Peter T. Davis

Matt Cutts #6: All About Supplemental Results

Here’s the sixth in the series of videos posted by Google’s Matt Cutts to Google Video over the past year. These are important for every web developer to see. Please see Matt’s disclaimer first!

See the rest of the videos!

Transcription

OK. We got some supplemental results questions.

David writes in. He says:

”Matt, should I be worried about this? site:tableandhome.com returns 10000 results site:tableandhome.com -intitle:by returns 100000 results. All supplemental.”

David, no in general I wouldn’t worry about this. I want to explain the concept of beaten path. So, if there is a problem with like a one word search in Google, that’s a big deal. If it is like a 20 word search, that’s obviously less of a big deal, because its off the beaten path.

The supplementary results team takes reports very seriously and acts very quickly on them. But in general, something in supplementary results is a little further off the beaten path than our main web results. And once you strat getting into negation, or negation by a special operator like ‘intitle’ stuff like that, that’s pretty far off the beaten path. And you are talking about results estimates, not the actual web results but the estimates for the number of results.

The good news is, there are a couple of things that will make our site:estimates more accurate. There are atleast two changes I know of in our infrastructure, one deliberately trying to make site: results more accurate. The other one is just a change in our infrastructure to improve over all quality but as a side benefit, it counts the number of results from a site more accurately when it involves the supplemental results. So there are atleast a couple of changes that might make results more accurate.

But in general, once you really start to get far off the beaten path, -intitle, all that sort of stuff, especially with supplementary results, I wouldn’t worry that much about the results estimates. Historically we have not worried that much, just because not that many people have been interested. But we do hear more people sort of saying, ‘yes I am curious about this”. So we are putting a little more effort into that.

Lets see. Erin writes in. He says:

“I have a question about redirects. I have one or more pages that have moved on various websites, I use classic ASP” and then he has given the response of 301. He says, “These redirects have been setup for quite a while, and when I run a spider on them, it handles the redirects fine”.

This is probably an instance where you are seeing this happen in the supplemental results. So here is how I think about things: there is a main web results Googlebot and a supplemental results Googlebot. And so, the next time supplemental results Googlebot visits that page, and sees the 301, it will index it accordingly and refresh and things will go fine. Historically, the supplemental results have been a lot of extra data but have not been refreshed as fast as the main web results. And if you do a cached page, you know, anybody can verify that the results on the crawl dates vary.

So, the good news is that the supplemental results are getting fresher and fresher and there is an effort underway to make them quite fresh.

For example, Chris writes:

“I would like to know more about the supplemental index. It seems while you wree on vacation, many sites got put there and I have one page where this happened pagerank of 6, since like May.”

So, I talked about the fact that there is new infrastructure in our supplemental resuts. I mentioned that on a blog post, I don’t know how many people noticed it, but I’ve certainly said that before. I think it was in the indexing timeline in fact. So as we refresh our supplemental results and start to use new indexing infrastructure, in the supplemental results, the net effect is things will be a little fresher, I wouldn’t be surprised and I am sure I have some urls in the supplemental results. I wouldn’t worry about it that much. And over the course of summer, the supplemental results team will take all the reports they see, especially off the beaten path like site: and operators that are kind of esoteric and they will be working on making sure that those return the sort of results everybody naturally expects.

So, stay tuned on supplemental results. It’s already a lot fresher and lot more comprehensive than it was and I think its just going to keep improving.

Transcription thanks to Peter T. Davis

Matt Cutts #5: How to structure a site?

Here’s the fifth in the series of videos posted by Google’s Matt Cutts to Google Video over the past year. These are important for every web developer to see. Please see Matt’s disclaimer first!

See the rest of the videos!

Transcription

OK. As you can see I’ve got the closest thing I could get to a worldmap… Did you know that there are over 5000 languages spoken across the earth. How many does Google support? Only about a hundred. Yeah. Still ways to go.

Alright! Lets do some more questions.

Todd writes in. He says:

“Matt, I have a question. One of my clients is going to acquire a domain name, very related to their business and has a lot of links going to it”. So, he basically wants to do 301 redirect to the final website after the acquisition. The question is, “Will Google ban or apply penalty for doing this 301 redirect?”

In general, probably not. You should be OK, because you specified that it is very closely related. Any time there is an actual merger of two businesses or two domains that are very very close to each other, doing a 301 should be no problem what so ever. if however, you are like a music site and all of a sudden you are acquiring links from Debt Consolidation online, or Cheap yada yada yada, that could raise a few eyebrows. But it sounds like this is just a run of the mill thing, I think you should be OK.

Barry writes in:

”What’s the best way to theme a site using directories. Do you put the main keyword in the directory or on the index page? if using directories, do you use a directory for each set of key words?”

This is a good question.

I think you are thinking too much about keywords and not enough about your site architecture. So, this is just for me. But I prefer tree-like architecture. So, every thing branches out and nice sort of even bounds. Its also good if things are broken down by topic. So, you know, if you are selling clothes, you want to have sweaters as one directory, shoes as another directory or something like that.

If you do that sort of thing, what you end up with is, the keywords end up in directories. And as far as directories versus the actual name of the html file, it doesn’t really matter that much within Google’s scoring algorithm. I think if you break it down by topic, but make sure that those topics match well with the keywords you expect your users to type in, when they try to find your page, then you should be in pretty good shape.

Alright! Jody writes in:

“If a e-commerce site’s url has too many parameters”, so, she’s got like the punctuation monster barfing over the number of parameters, “and it is unindexable, is it acceptable to use the Google guidelines to server static html pages to the bot to index instead.”

This is something to be very careful about, because if you are not you can end up being into an area that is known as cloaking. Again, cloaking is showing different content to users than to Google bot. And you want to show the exact same content to the users as you do to the Google bot. So my advice would be to go back to the question I answered a while ago about dynamic parameters and urls and to basically see if there is a way to unify it, so that the users and Google both see the same directory. If you do something like that, that’s going to be much better. Failing that you want to make sure that what ever html pages you do show, if users go to the same page, they don’t get redirected, they don’t go somewhere else. They need to see the exact same page that Googlebot saw. That’s the main criteria of cloaking and that’s what you want to be careful.

John Wooley writes in. He says:

“I would like to use A/B split test on one of my static html site, will Google understand my php-redirect for what it is, or will they penalize my site for perceived cloaking? If this is a problem, is there a better way to split test?”

That’s a good question.

If you can, I would split test in an area where search engines aren’t going to index it. Because, anytime we go to a page, and we see different content, or if you re-load and you see different content, that does look a little bit strange. So if you can, its better to use, robots.txt or .htaccess files or something to make sure that Googlebot doesn’t index your A/B testing. Failing that what I would do is, I wouldn’t use php-redirect, I would try to use something server-side to actually serve up the two pages in place. The one thing to be careful about and I touched on this a while ago earlier in another session was, you should not do anything special for Googlebot. Just treat it like a regular user. That’s going to be the safest thing, in terms of not being treated like cloaking.

And, lets wrap up! Todd asks another question.

“Aw heck. How about a real question. Ginger or Mary Ann?”

Ah, ha ha, I am going to go Mary Ann (nodding his head).

Alright. That’s enough for another session.

Transcription thanks to Peter T. Davis

« Previous PageNext Page »