What are you really searching?
Finding the Web documents (a.k.a. Web "pages" or "sites") you want can be easy or seem impossibly difficult. This is in part due to the sheer size of the WWW, currently estimated to contain 3 billion documents. It is also because the WWW is not indexed in any standard vocabulary. Unlike a library's catalogs, in which can use standardized Library of Congress subject headings to find books in most large, general libraries in the U.S., in Web searching you are always guessing what words will be in the pages you want to find or guessing what subject terms were chosen by someone to organize a web page or site covering some topic.
When you do what is called "searching the Web," you are NOT searching it directly. It is not possible to search the WWW directly. The Web is the totality of the many web pages which reside on computers (called "servers") all over the world. Your computer cannot find or go to them all directly. What you are able to do through your computer is access one or more of many intermediate search tools available now. You search a search tool's database or collection of sites -- a relatively small subset of the entire World Wide Web. The search tool provides you with hypertext links with URLs to other pages. You click on these links, and retrieve documents, images, sound, and more from individual servers around the world.
There is no way for anyone to search the entire Web, and any search tool that claims that it offers it all to you is distorting the truth.
Recommended Search Engines: Google, Yahoo, Excite, Tacoma, and About, just to name a few.
Google has the largest database of Web pages, including many other types of Web documents (e.g., PDFs, Word, Excel, PowerPoint documents). Despite the presence of many advertisements and considerable clutter from blog sites and newsgroups, Google's popularity ranking often makes pages worth looking at rise near the top of search results. Our new "Googling to the Max" course (when offered?) reflects our recognition that Google currently is the winning web search engine and so people need to learn to use it really well.
Google alone is often not sufficient, however. Less than half the searchable Web is fully searchable in Google. Overlap studies show that about half of the pages in any search engine database exist only in that database. Getting a second opinion is therefore often worth your time. For a second opinion, we recommend Teoma or Yahoo! Search. We no longer recommend using any meta-search engines for web searching.
How do Search Engines Work?
Search Engines for the general web (like all those listed above) do not really search the World Wide Web directly. Each one searches a database of the full text of web pages selected from the billions of web pages out there residing on servers. When you search the web using a search engine, you are always searching a somewhat stale copy of the real web page. When you click on links provided in a search engine's search results, you retrieve from the server the current version of the page.
Search engine databases are selected and built by computer robot programs called spiders. Although it is said they "crawl" the web in their hunt for pages to include, in truth they stay in one place. They find the pages for potential inclusion by following the links in the pages they already have in their database (i.e., already "know about"). They cannot think or type a URL or use judgment to "decide" to go look something up and see what's on the web about it. (Computers are getting more sophisticated all the time, but they are still brainless.)
If a web page is never linked to in any other page, search engine spiders cannot find it. The only way a brand new page - one that no other page has ever linked to - can get into a search engine is for its URL to be sent by some human to the search engine companies as a request that the new page be included. All search engine companies offer ways to do this.
After spiders find pages, they pass them on to another computer program for "indexing." This program identifies the text, links, and other content in the page and stores it in the search engine database's files so that the database can be searched by keyword and whatever more advanced approaches are offered, and the page will be found if your search matches its content.
Some types of pages and links are excluded from most search engines by policy. Others are excluded because search engine spiders cannot access them. Pages that are excluded are referred to as the "Invisible Web" -- what you don't see in search engine results. The Invisible Web is estimated to be two to three or more times bigger than the visible web.