Braintique.com header
Left Navigation Bar

Information That Cannot Be Found in Google


Not all information can be found using Google, and not even all the information available on the Web can be accessed with Google.

From Chapter 10, here's some information about what can and can't be found using Google, and some places to go for information not in Google.

Not in Google


There are large parts of the World Wide Web that no search engine—including Google —can “see.” It’s hard to give a good definition of the invisible Web (which is also sometimes called the deep Web and dark matter). The best way to think of it is simply as material that is on the Web that has been excluded from search engines, specifically from Google, either on purpose or due to technological limitations.

Material on the Web that is invisible to Google will almost certainly be invisible to other search engines as well.

It’s easy to see why some Web sites—such as those with Adult content—might be excluded from Web search engines and thus rendered “invisible.” But it may be a little harder for you to understand why sites that contain information of value to researchers are also invisible.

There are a number of possible reasons that Web pages might be excluded from Google (and other search engines). These include:
  • Dynamic results aren’t easy to read: If a page is dynamically generated and assembled from a database, Google might not return it as a result to your query. Although spiders can access dynamically generated pages, particularly if a page is pulled intact out of a database, and even the returns page from a Google search can be considered a dynamically generated page, spiders can have trouble with any dynamic generation that involves setting multiple fields to return the results.
  • Membership has its privileges: If you’e required to log in to access a page and/or a subscription or fee is required to access the page (see “Premium and specialized online services”), the results may not come up.
  • The page is flying solo: If the Web page is “disconnected,” with no other pages linking to it (so it is hard for spiders to find), it won’t show up high on your list of results.
  • The page doesn’t have words: If a Web page contains mostly non-text matter, such as imagery, indexing may be limited to ancillary text such as that in alt parameters of the img tags.
  • Part of a site is available, but not the good stuff: Depending on the site and the information it contains, a Webmaster may mark specific pages as off-limits to crawlers.
  • The information is in a format that can’t be easily read: If a file is in a format that is hard for the spider to read, such as an executable file, or a compressed file (suhs as a .zip file ), Google may not be able to find it. Google (unlike most other search engines) does have an impressive ability to index Acrobat (PDF), Postscript files, and Microsoft Word documents.

Finding Information Not in Google


For a researcher, the most important part of the invisible Web is made up of fee-based premium services that provide high-quality information. The information provided by these services may be stored in some kind of database, but to the researcher it hardly matters so long as the service makes a Web interface to the data available.

Some of the best-known online fee-based research services are:
  • DataStar, a professional research service with an emphasis on companies and industry
  • Dialog, an extensive research service that makes more than 600 research databases available either through dedicated software or the Dialog Web site
  • Factiva, an extensive research database with a focus on business and finance and current events
  • LexisNexis, perhaps the best known research service, featuring a wide variety of databases covering business, news, and legal affairs
  • Questia, an extensive library of books and periodicals, primarily in the social sciences
  • Westlaw, an online legal research service that provides access to statutes, case law, public records, and other legal content.


Resources by topic Book Table of Contents Resources by chapter


Google
 
Web www.braintique.com
www.digitalfieldguide.com www.googleplexblog.com


Home | Barticles | Blogs | Books | Services | FAQ | Contact

© Braintique.com. All rights reserved.

Search Engine Optimization







RSS 2.0 Syndication feed

Syndication Viewer

Our Web host:
IX WebHosting



Food for Your Brain! Get a Barticle! Questions Answered Books for You What We Can Do For You Contact Us Brain Food Questions Answered Books for You What We Can Do For You Frequently Asked Questions About Us Google Research Photoshop Wi-Fi and Wireless Networking The Natural Way to Write