InternetSuccessKey.com: Website Development
Why some sites aren't listed in search engines
It's one thing to have a low search engine ranking, quite
another to have none at all. In a previous article, I summarized ethical and
proven search engine
submission/optimization strategies; but what if you've submitted your
site and it hasn't been listed?
You're not alone - the question of why people's sites don't appear in search
engine results is fairly common. If you're having this problem, this article
may provide some valuable clues and strategies for rectifying the issue.
How do search engines gather information?
First of all, it's good to have a basic understanding of how
search engines gather information. Automated software programs, also known
as bots, crawlers or spiders, spend their time automatically following links
on web sites or are initiated by a submission of a site's URL to a search
engine.
When a manual submission is performed, the bot doesn't
spider the entire site immediately - the request is queued. Once the bot
starts spidering, the process can take days, weeks, or even months. Even if
the site is only partially spidered, results can then become available in
the SERPS (Search Engine Results Pages).
As the spiders follow links throughout a site, they gather
information about the content of the pages. The content includes not just
what humans can see but also other aspects including meta
tag and title tag information,
who is linking to that page and where the links on the page are linking to.
If you're interested in learning how to identify if a search
engine robot is visiting your site; read our tutorial on search
engine spider identification.
How do search engines rank?
The results of the search engine spiders' forays throughout
your site are then processed by complex algorithms, or formulas. Even the
text you use in files names, links or the text that people use to link to
you plays a role in the algorithm (see anchor
text optimization).
Each search engine company has a different set of algorithms
and the specifics are closely guarded secrets. This is to help prevent site
owners from "spamming" the engine; i.e. to artificially inflate
their rankings for particular queries.
After the page/site information is processed, the
information is added to the search engine's database and is then available
to searchers.
Sounds pretty simple I guess, but when you take into account
that the most popular search engines have billions of pages in their indices
- many competing for no.1 positions on particular queries, you can imagine
just how much computing power is needed and how complex the algorithms are.
Given the complexity of indexing and ranking; things are
bound to go wrong from time to time and some sites will be excluded for a
variety of reasons. If you're having problems with getting your site listed,
one or more of the following may have occurred. This information is fairly
generic and has been provided with the big 3 search engines in mind -
Google, Yahoo and MSN.
How long ago was the site launched?
While search engines are much faster these days with
indexing sites, it can still take a few months to get listed - depending on
the search engine. If it's been less than 12 weeks since you submitted your
site - don't panic. During the 12 weeks, check to see if spiders
have been visiting your site.
If no spiders have been visiting your site, or they are only
hitting the home page and it's been more than 12 weeks since you submission,
you can contact the search engine company to ask for some advice as to why
you weren't listed.
When approaching a search engine company, always be very
polite and to-the-point. Bear in mind that they are under no obligation to
list your site. Also be prepared for an extended wait in relation to getting
a response from them and for a lack of specifics if they mention that your
site hasn't been listed as it doesn't meet their submission quality
guidelines.
Have you checked properly?
It may be that your site is listed, but just ranking very
lowly. To check to see if your site is listed, run a search on:
"site.com"
With "site" being your domain name.
If you find your site, it's a good start - it just means you
either have a naturally low ranking, ranking calculations haven't been
finalized or that perhaps the site has been penalized. The latter part of
this article covers some issues pertaining to penalties. If your site is
squeaky clean, but still buried in the results, try reviewing some of our
other optimization
strategies.
No inbound links
As mentioned, search engine spiders find your site not only
through direct submissions, but also through links from other listed sites
that point to yours. If nobody else is linked to your site, this can
contribute to the problem of getting listed. In these cases, it's wise to
implement a linking campaign. Learn more about requesting
reciprocal links and link
exchange tips & software.
The Sandbox effect
There's been much talk of late amongst industry experts of
some major engines being hesitant to rank newer websites until they have
existed for more than a period of x months. If your site appears in one
engine, but not another, then this can be an indicator of the "sandbox
effect". There's not much you can do in this situation except continue
working on your site and just wait.
Duplicate, non-original content
For search engines, it's not just the amount of content, but
the quality. If your site consists mainly of someone else's content, this
can prevent you from being listed.
Lack of content, image/Flash based site
Search engines feed best on text - they cannot read images
and very limited abilities when it comes to dealing with Flash based sites.
If your site is primarily image based, it may be time to rethink it's design
to allow for some textual content. If this isn't possible, then the use of
"alt" text on all your images is strongly advised. Here's an
example:
<a href="http://www.tamingthebeast.net"><img
border="0" src="../images/picture.gif" alt="A
description of this image including keywords" width="458"
height="39">
Dynamic sites
Dynamic sites are those where content is generated from a
database instead of the traditional method of having the content "hard
coded" on the actual page (known as static content).
In the early days of dynamic site technology, search engine
spiders would choke on long URL's or would shy away from URL's with many
parameters such as "?", "&" "id",
"%", "+" and "=". While basic dynamic URL's
are now handled without a problem, I've still seen many instances where
multiple parameter output is ignored.
A good rule of thumb is that your URLs should not display
more than 2 such parameters when viewed in the browser address bar. If you
have more and your site is having problems with getting listed, consult your
web developer about the issue.
Many dynamic sites are powered by off-the-shelf CMS's
(Content Management Systems). In this situation, there are usually special
bolt-on scripts, called mods, available to turn output from the CMS from
dynamic style URL's to static. For example:
http://site.com/main.php?id=stuff&t=bleh&p=ick
can be changed to:
http://site.com/bleh-ick.htm
If there isn't a mod available, manual coding of your .htaccess
file can produce the same result.
Incorrect redirects, modifications or broken links
If you are using redirects or other forms of .htaccess
redirection and/or rewriting, it's important to get it write. Each time a
page is requested from your site, a header response is generated. The codes
returned to the spider include:
200 (OK)
404 (File not found)
301 (moved permanently)
302 (moved temporarily)
403 (forbidden)
There's around a dozen other codes, but what search engine
spiders want to see is either 200 or 301. If you are redirecting pages or
domains, it's always wise to use a 301
redirect.
Here's a tool you can use to check your server
header responses.
Also, don't forget to check your site for broken links;
another issue that can impede a spiders' progress.
Robots exclusion
While sites are in development, it's not unusual for the
developer (if he has some search engine savvy) to add robots exclusion tags
to pages, or entries to the robots.txt file. This is to prevent the site
from being spidered while under development. If these are in place, they'll
need to be adapted or removed altogether before the search engine spider
will be able to retrieve data from your site.
Learn more about the robots
exclusion meta tag or the robots.txt
file
Server issues
Murphy's law states, "if anything can go wrong, it
will". If the server your site is hosted on experiences difficulties
during the time that the search engine spider visits your site, this can
cause the spider to "think" that the site doesn't exist. If you
are aware of serious server downtime during the period after submission, it
may be wise to try submitting your site again.
Note: when manually submitting your site, it's
advisable not to repeatedly do so within a short space of time. Once every
couple of weeks is enough. Unlike the old days, once your site is listed,
there is no need to continue with manual submission. See our tutorial
for free submission pages for Google, Yahoo and MSN + other search
engine submission tips.
Once you're in, you're in and you should remain in unless
your site is banned. Your site can be banned for a number of reasons, the
same reasons that can prevent it from being listed in the first place; so
lets now examine some of those:
Getting unbanned
Before we discuss the various issues that can get you
banned, it's important to note that if your site has any of these issues and
you do rectify them; don't assume that you'll be relisted
automatically.
You'll need to contact the search engine company, state that
you found an issue on your site that may have caused it to incur a penalty
or ban and ask for it to be reconsidered for listing. Again, bear in mind
that the search engine is under no obligation to list your site; so be very
polite and very patient in waiting on a response.
Invisible text
If you have large amounts of text or keyphrases that are the
same color as the background of your page, many search engines will view
this as an attempt to spam. Whether intentional or unintentional, you should
rectify this immediately.
Other advertising campaigns
I've seen this happen many, many times - a site owner
submits their web site to the engines, gets impatient after a couple of
weeks and then decides to take out a "1,000,000 visitors for $24"
campaign.
Many of these cheap traffic strategies can actually get you
banned. Rule of thumb - if it sounds too good to be true, then it probably
is - and you should steer clear of such schemes. This is especially the case
where the method of delivery of traffic is through a page on your site being
displayed. Not all such advertising campaigns are shonky, you just need to
be careful.
Be patient - there really is no magic bullet solution to
rankings and listings; solid and ethical
search engine optimization strategies are still the only way to go.
Linking to bad neighborhoods
It's hard to define this and there are no hard and fast
rules; but there has been a lot of anecdotal evidence to suggest that a new
site linking to many banned sites can prevent the new site from being
listed. It's wise to check out any sites you are thinking of linking to
before doing so. If you find that you have linked to many banned sites, just
remove the links.
Link farms
If your site has little content and consists mainly of links
that you've swapped with other sites, this can get you labeled as a link
farm and can prevent you from being listed. Remove as many links as you can
and focus on building content. If you want to engage in reciprocal linking,
learn more in our link exchange
tutorial.
Cross linking
For site owners who have a number of sites, it's not unusual
to see them linking to each site from each site on every page in an effort
to keep visitors within their "network" but also to boost search
engine rankings.
This can not only impede getting a new site listed, but also
have a negative affect on all cross-linked sites. My advice is that if you
do wish to cross-link, do it from one page only - your "about"
page is probably most suitable for this.
Banned domain name
Another hard one to pin, but this definitely happens on some
engines. It's not uncommon for the owner of a banned site to simply dump the
domain name. That domain name is then registered by someone else, but the
ban may still remain in place. The only free way I know of to try and track
a domain name's history is through entering the domain name at archive.org
and see what it brings up. If you find that the previous owner of the domain
name had illegal or questionable content; you can either dump the name
yourself and register a new one, or contact the search engine and let them
know that name ownership and content of the site has changed.
Improper use of search optimization software
Over the last couple of years, many software packages
guaranteeing to increase search engine rankings have hit the market. These
applications have their place, but I've also read of sites being banned
through the over usage of certain packages. Learn more about search
engine optimization software - the benefits, the dangers and responsible
use.
Cloaked/doorway pages
A couple of years ago, there was a huge trend of people
creating pages specifically for search engines, some that humans would
never see. This usually worked one of two ways:
a) a series of pages would be created that were loaded with
keywords and phrases, and no other content.
b) a special "sniffer" script was installed in a
"real" page. When the script detected that a search engine robot
had requested that page, it would actually deliver another page, most likely
one described in a)
Search engine companies tweaked to this and added detection
methods in their algorithms to combat this kind of spamming.
Shady SEOP work
If you hired a Search Engine Optimization Professional to
tweak your site, bear in mind that like in any other trade there are good,
bad and evil practitioners. Just because a "professional" worked
on the site, it doesn't mean they used ethical strategies. Run a search on
the company to see if other people have had similar problems. Learn more
about the sometimes shady world of SEOP
and SEM companies.
This is just a brief listing of some of the points that can
keep your site from being listed by search engines. I'll be updating it from
time to time as other points come to mind. If you have an experience you'd
like to share with others about why your site wasn't listed and what you did
to rectify the issue, I'd love to hear from you!
Further learning resources
Review all our search
engine optimization tutorials
Michael Bloch
Taming the Beast
http://www.tamingthebeast.net
Tutorials, web content, tools and software.
Web Marketing, Internet Development & Ecommerce Resources
____________________________
Copyright information.... This article is free for reproduction but must be
reproduced in its entirety, including live links & this copyright statement must be included.
Visit http://www.tamingthebeast.net
for free Internet marketing and web development articles, tutorials and
tools! Subscribe to our popular ecommerce/web design ezine!