The
Internet and the World Wide Web
1. Finding
Secondary Data on the Internet
While gathering secondary data is necessary in almost any research project, it has traditionally been a tedious and boring job. The researcher often had to write to government agencies, trade associations, or other secondary data providers and then wait days or weeks for a reply that might never come. Often, the researcher made one or more trips to the library, only to find out that the needed reports were checked out or missing. However, beginning in the late 1990s the rapid development of the Internet and the World Wide Web helped eliminate the drudgery associated with secondary data. This secondary data source continues to expand, with publicly available documents on the Web doubling every 18 months.
First, a few definitions:
Internet – a worldwide computer
network consisting of smaller, interconnected
networks. A network consists of two or
more computers connected to one another for the purpose of sharing communication
and resources. The Internet links both
public and private computer systems to allow users to access information and
documents from distant sources.
World Wide Web (WWW) (or “Web”) – that portion of the Internet
servers that support a graphical interface retrieval system that organizes
information into thousands of interconnected pages or documents called Web
pages. Content providers are the parties that provide information on the
WWW known as Web sites, which
consist of one or more Web pages with related information about a particular
topic. Marketing content providers include companies, direct marketers, electronic
retailers (e-tailers) and other organizations that have their own Web sites
<Dell.com>, as well as Web sites that contain secondary data.
There are two general ways
people search for information on the Internet.
First is directed search/prepurchase search (“shopping”) with purchase of
one or more particular products in mind.
Consumers know what they are looking for and usually have some existing
information to rely on (e.g., a producer’s name, brand name, or a set of terms
that describes the product category.
Here, they usually search for key/search terms in search engines as
described below. Second is browsing or casual search, with no particular immediate purchase in mind. Here, the user might not have an immediate
need or might have a less precise view of the information that might be
available. Browsing relies heavily on
hyperlinks between documents, allowing the browser to navigate through
cyberspace in a non-sequential manner.
The Web has been likened to a library
lacking a card catalog—there is no central authority that lists all possible
sites accessible via the Internet. This
results in surfing—gliding in an
unplanned fashion from home page to home page.
There are, however, two organized ways to find information on the
WWW—general-purpose search tools such as browsers and search engines, and
specially designed tools, such as shopping bots:
(1) The uniform resource locator (URL) is the Web site address or domain name, which identifies a particular location [Web server and file
on that server of the site where the information you need is located. You can enter the URL for that site in the
search window (by clicking “File- Open Window”) of your Web browser, which takes you directly to that site’s home page or start page (the introductory page or opening screen of a Web
site). (The original six top level
domains widely used in the U.S were: .com [commercial or “dot.com”], .edu
[education], .net [network operations], .gov [
(2) Use a search engine—a computerized directory that allows users to
search the WWW for information in a systematic way. The user types in keywords. The big four sites are Google, Yahoo!, MSN,
and America Online. Others include Alta
Vista, Lycos, and Ask Jeeves, among others.
Each search engine contains collections of links to documents throughout
the world, and each uses its own indexing system to help you locate the
information you are looking for. Some
search for titles or headers of documents, while others search words in the
documents, and still others search other indexes or directories. You can also search for directories by typing
in the word "database” in the search box.
All search engines allow you to enter
one or more key words or search terms into the text box. They then return listings of hyperlinks or links (electronic connections from one Web site to another Web
site). Alternatively, you can click on a
list of broad topics (art, business, entertainment, etc.) to go to
subdirectories or else home pages. Most
search engines use a best match search process and present search output in
order ranked by relevance, based on: how many of the search items were found in
the document, how often the search items were found in the document, where in
the document the search items were found (e.g., URL< meta tags, etc.), and
proximity of the terms to one another.
It is not uncommon to find a large number
of hits; if this is the case, the rule of thumb is to scan the first 50 hits,
and if these don’t provide useful information, to consider redesigning the
search strategy. The outcome of a
search (“retrieval set”) usually takes the form of a list of Web pages
representing the records retrieved, ranked in order of their potential
relevance to the query and presenting a certain number (say, ten) at a
time. Each of these incorporates a
hypertext link to the source document.
There are five types of search engines:
(a) Hierarchical
search engines or directories
add value through human intervention in the assignment of subject headings to
records in databases. In addition, all
sites are evaluated prior to inclusion.
Such sites only contain submissions from users—they don’t perform a
search of the Web—hence, they are not comprehensive,omitting
a large portion of the information on the Web.
Web site creators may their page for inclusion in the evaluation
process. The maintenance of such
directories is a labor-intensive process—therefore such search services are
selective in the sites that are included.
However, such selection reduces the amount of garbage one often
encounters in an Internet search.
<Exhibit 17.6> Yahoo! is an example of a search engine built on a hierarchical, subject-oriented guide. All sites have to fit into a certain category/subject heading and subcategories (e.g., Stolichnaya vodka is indexed as Business and Economy/Companies/Drinks/Alcoholic/Vodka). Going to Business and Economy/Companies/Sports/Snowboarding/Board Manufacturers gives almost 60 companies that sell snowboards on the Web. <Exhibit 17.7> Searching is via menus of these subject headings and/or through keyword searching.
(b) Collection search engines. <Exhibit 17.8> Alta Vista is an example of a
search
engine that uses a spider—an
automated program that crawls around the Web and collects information. The advantage of these is that they tend
to be very comprehensive. Because there
are so many sites, they rank the best matches first. <Exhibit 17.9>
(c) Concept search engines. Excite is an example of a concept search
engine—they use
a
concept, rather than a word or a phrase, as the basis for the search.
<Exhibit 17.10> To narrow the original search,
one clicks on one of the sites found in that search to do another search. The percentage key gives the user an idea of
how close a particular site is to his or her concepts. For example, Ask Jeeves <Exhibit 17.11>
allows users to type in natural-language questions. Concept search engines can be a relatively
efficient and focused way of searching.
The disadvantage is that they aren’t as comprehensive as collection
search engines.
(d) Meta-engines/meta search engines/mega-search engines search multiple search
engines simultaneously
for
words and phrases. They then combine
results, remove duplicate entries, and /present a single listing. Examples include MetaCrawler, Dogpile, and
Debriefing (the latter is maintained by librarians who are constantly refining
and upgrading the site). Some of these
can be found in the list of search engines when you click on the “search”
button of your browser, and others are found by typing “www.searchenginename.com
(e.g., www.dogpile.com; www.debriefing.com). They are a quick way of searching across
several search tools, although they might not support some of the more
sophisticated search facilities. There
are also specialty search engines that limit searches to specific topic areas
such as law, business, and medicine, as well as Web community sites such as
www.theglobe.com.
(e) Robot search engines/search bots. This newest type of search engine acts like meta
search tools and searches many Internet search engines in parallel. They differ from meta
search tools in that they are loaded at the local workstation rather than
operating in client server mode. Also,
they use robots (“Bots”) or intelligent agents to roam the Internet
in search of information. Once a search
has been performed, the user needs to assign relevance rankings to the items
retrieved. The intelligent agent uses
this information in the next iteration to modify its search operation. <Exhibit 17.12> For example, Travelocity.com finds the
best deals for your traveling needs, while BargainFinder (www.BargainFinder.com) does so for
your music needs. Some Web retailers
have designed their sites to either refuse the robot admission or to confuse
the robot, as they wish to avoid a “cheap” image.
(f) Search
engines for specific sites.
E-tailers with large catalogs of products, such as Amazon.com, need a
search engine to support users in navigating their way through the
cyber-store.
Some search engines (e.g., Yahoo! and
Lycos) serve as portals
(entry/starting points), for Internet exploration. <Exhibit 17.14> America Online is a
well-organized Web portal from which a Web surfer can link/jump to many
locations highlighted by AOL. Commercial
Web sites pay AOL to be featured in this way. .
Such portals can be vertical—serving one industry or market (such
as an ethnic market) or horizontal—serving multiple industries and
markets.
Which search engine should you use? The best search engines cover about 30
percent of the estimated pages out there.
A 1999 study found that Northern Light, Snap, and Alta Vista index
significantly more (16%) of the Web than the other popular search engines. The most up-to-date search engines were Alta
Vista, Excite, and Hotbot. It is also a
good idea to use multiple search engines since there is surprisingly little
overlap between the major search engines. Meta-engines
search multiple search engines simultaneously, e.g., MetaCrawler, Dogpile, and
Debriefing (the latter is maintained by librarians who are constantly refining
and upgrading the site). Some of these
can be found in the list of search engines when you click on the “search’
button of your browser, and others are found by typing
“www.searchenginename.com (e.g., www.dogpile.com;
www.debriefing.com). There are also specialty search engines that
limit searches to specific topic areas such as law, business, and medicine as
well as Web community sites such as www.theglobe.com. These niche or “vertical” search engines only
search within a narrow band of interest.
They are sometimes called vortals
(a contraction of “vertical” and “portal”), and they might also offer expert
reviewers and provide the “best” recommended sites in a given area.
(g) Blog search engines such as
Technorati, Feedster, or Blogdiggger. If
you’re looking for very current information (such as today’s buzz), these are
useful.
(3) Shopping bots are specialized search bots designed to locate and compare
products.
They
take a query, visit shops that might have the sought product, bring the user
the results, and present them in a consolidated, compact format that
facilitates comparison shopping. Many
also provide access to an order form.
Searching is on the basis of full text
and/or product categories. Some
shopping bots are comprehensive in coverage (e.g., MySimon, NetMarket, and
Planet Retail) while others focus on a specific product range (e.g., BargainBot
for books, Bargain Finder Agent for Music and CDs, Gift finder for gifts, and
Price Scan for computer software and hardware).
Most shopping bots claim to eliminate the
searching necessary to identify the right product at the best price.
The procedure for searching is:
·
Use a plus sign (+) in front
of a word to indicate that it must appear in each Web page of the query results
(e.g., hotels+San+Fransisco). Without
the plus sign the word isn’t considered mandatory.
·
Use a minus sign (-) in front of any word that shouldn’t be
included in any Web page in the search results (e.g., Cars-Ford)
·
Enclose a multiword phrase in quotation
marks to tell the search engine to list only sites that contain those words
in that exact order (e.g., “
·
AND works like the plus sign,
indicating that all the words joined by AND must appear in the document (e.g.,
to find documents that contain the words wizard,
oz, and movie, enter: wizard AND
Oz AND movie).
·
OR joins words, at least one of
which must appear in the document (e.g., to find documents that contain the
word dog or puppy, type: dog OR puppy).
OR is often used to broaden a search (e.g.: (travel OR tourism OR
cruises OR cruising OR vacations OR vacationing OR vacationers) AND (Caribbean
OR Bermuda OR Jamaica OR Virgin Islands))
·
AND NOT or OR NOT is similar to the minus sign and is used to exclude words
in the document, words that are likely to match your search requirements but
have nothing to do with the search topic. (E.g., to find documents that contain
the word pets but not the word dogs, enter: pets AND NOT dogs; e.g.:
Dolphins NOT NFL).
·
NEAR should be used when words
should be near each other (e.g., Moon NEAR River).
·
() Parentheses are used to group portions
of Boolean queries together (e.g., to find documents containing the word fruit and either banana or apple type
“fruit AND (banana OR apple”).
·
Title search allows you to search for
titles of web documents (e.g., “title:Mars” or
“t:Mars” will retrieve all documents with the word “Mars”).
·
* Wild card (e.g., eco* will return
economy, economics, ecology, etc.)
+ Some Hints for
Searching:
·
Be specific. Tying in “DVD Players Reviews”
will give you a better set of results than the more general “DVD Players.”
·
Add quotation marks. Keep exact phrases and proper
names intact by enclosing them in quotation marks. Use words most likely to be used (e.g., try
“John F. Kennedy” and “born” rather than “John F. Kennedy” and “birth date”).
·
Use the “advanced Search” feature tool.
For example, you can scour only certain kinds of documents by excluding
pages with certain words.
3. After typing the search
request, click on the search button.
(The search engine then searches the entire Web or a subset of the Web
to locate sites meeting your search parameters.)
Web sites
are also discovered via word-of-mouth communication as well as checking
favorite Web sites on others’ home pages.
Also, much information, like airline
on-time records, is buried in databases and not in Web pages scoured by search
engines. Access such hidden information
through www.invisible-web.net.
2. Finding Federal Government
Data on the Internet
A great source here is the Statistical
Universe, created by the Congressional Information Services (CIS). It is available at www.cispubs.com. This is the most comprehensive and fully
indexed source for federal stats online.
Entire reports can be downloaded.
Links to the 70 federal agencies recognized by the Office of Management
and Budget as issuing statistical data can be found at www.fedstats.gov.
3. Internet Discussion Groups
and Special Interest Groups
Newsgroups are Internet sites devoted
to a specific topic where people can read and post messages. They are a primary means of communication
between professionals and among members of special interest groups. You can visit any newsgroup supported by your
Internet service provider (ISP). If your
ISP doesn’t offer these groups or carry the one you are interested in, you can
find one of the publicly available newsgroup servers that carries
the group you’re interested in. Both Netscape and Internet Explorer, as well as other
browsers, come with newsgroup readers.
Newsgroups are like bulletin boards for a
particular topic. People stop by the
newsgroup to read messages left by other people, post responses to others’
questions, and send rebuttals to comments with which they disagree. There is usually some management to keep the
discussion on topic and to remove offensive material.
Newsgroup messages look like e-mail
messages, containing a subject title, author, and message body. They differ in that they are threaded
discussions—any reply to a previous message will appear linked to that message.
Therefore, you can follow a discussion by starting at the original message and
following the links (threads) to each successive reply.
Although Stonehill doesn’t have a
newsgroup server, the usual procedure is:
1, Open your
newsreader program in your browser.
In Navigator click Communicator-Newsgroups, or in Netscape e-mail click
on “News.”
2. Search for the newsgroups of interest
by keywords or topics.
3.Select the newsgroup of interest.
4. Begin scanning messages. The title of each message generally gives an
indication
about the
subject matter.
Usenet is a collection of discussion groups in
cyberspace. People in a Usenet group can
read messages on a given topic, post new messages, and respond to existing
messages. For advertisers, a Usenet
group can be a highly targeted audience for advertising messages. Marketing researchers can also use Usenet as
a form of unobtrusive observational research.
By visiting them, you can get the latest opinions on products, services,
stores, etc. There are also
company-specific sites, such as U-Hell (for U-Haul0, Untied.com (for United
Airlines), and TheWorst.com (for Sprint PCS), as well as more general complaint
sites, such as UgetHeard.com, Bitchaboutit.com, Complain.com, and Epinions.com. Some of these sites actually send the
complaints to the offending companies, who can choose to respond with some sort
of restitution.
Unfortunately,
some of the complaints are suspect of being fake.
Chatrooms—either one set up by your company on your web site, or others’ chat
rooms, can be monitored for word-of-moth communication. These can function as virtual focus groups
operating in near-continual session, enabling companies to track consumer buzz
as it develops.