Saturday, July 24, 2010

The Invisible Web

A data visualization of Wikipedia as part of t...Image via Wikipedia
The Deep Web (also called Deepnet, the invisible Web, dark Web or the hidden Web) refers to World Wide Web content that is not part of the Surface Web, which is indexed by standard search engines.
To discover content on the Web, search engines use web crawlers that follow hyperlinks. This technique is ideal for discovering resources on the surface Web but is often ineffective at finding deep Web resources. For example, these crawlers do not attempt to find dynamic pages that are the result of database queries due to the infinite number of queries that are possible. It has been noted that this can be (partially) overcome by providing links to query results, but this could unintentionally inflate the popularity for a member of the deep Web.[1]

Google, the largest search database on the planet, currently has around eight billion web pages indexed. That's a lot of information. But it's nothing compared to what else is out there. Google can only index the visible web, or searchable web. But the invisible web, or deep web, is estimated to be 500 times bigger than the searchable web. The invisible web comprises databases and results of specialty search engines that the popular search engines simply are not able to index.[2]

The visible Web is easy to define. It's made up of HTML Web pages that the search engines have chosen to include in their indices. The Invisible Web is much harder to define and classify for several reasons.
First, many Invisible Web sites are made up of straightforward Web pages that search engines could easily crawl and add to their indices but do not, simply because the engines have decided against including them. The Invisible Web is hidden because search engines have deliberately chosen to exclude some types of Web content. These exceptional resources simply cannot be found using general-purpose search engines because they have been effectively locked out.[3]

Invisible web resources can be classified into three broad categories:

1) Non-text files such as PDF files, multimedia, graphics, executable files, CGI scripts,
software, and some other document files.

2) Information contained in databases, real-time content and dynamically generated content.

3) Disconnected Pages,which are pages that exist on the web, but do not have any other pages linking to them. [4]

As more people gain access to information on the Web and more content is continuously added, it is important to know how to evaluate Web sites to determine if the information is reliable.

Educators also rely on the educational web site to work technically in the classroom. Misinformation and technical difficulties can cause a great deal of distress for not only the students but for the teachers as well. For this reason, audience, credibility, accuracy, objectivity, coverage, and currency are the major issues educators should focus on when examining the content of educational web sites. Aesthetic and visual appeal, navigation, and accessibility are the major issues educators should focus on when examining the technical aspects of educational web sites. [5]


[1] Deep Web-From Wikipedia, the free encyclopedia
en.wikipedia.org/wiki/Deep_Web

[2] Research Beyond Google
oedb.org/library/college-basics/research-beyond-google

[3] The Invisible Web: Uncovering Sources Search Engines Can't See
www.allbusiness.com/technology/internet-web-design/943472-1.html

[4] The Invisible Web Explained
www.valenciacc.edu/library/east/invisible_explained.cfm

[5] Criteria for evaluating Educational Web Sites
members.fortunecity.com/vqf99/plain1.htm

Related articles selected by Andreea Loffler

Google is Cracking the “Invisible Web”
www.marketingpilgrim.com/2008/04/google-is-cracking-the-invisible-web.html

Five criteria for evaluating Web pages
www.library.cornell.edu/okuref/research/webcrit.html
Enhanced by Zemanta