![]() |
|
Concept
and Diagnostic Research for the Web:
Lessons Drawn from Case Studies 1995-1998
by: Cheryl Harris, Ph.D.
Paper
presented at: ESOMAR "Worldwide Internet Seminar," Paris, January,
1998
copyright
Cheryl Harris 1998. All rights reserved.
Earlier
Forrester Research projected that expenditures for website development
alone will reach $10 billion annually before the year 2000 (Forrester
1997.) Currently, the cost of building a full website in major metropolitan
areas such as New York City is estimated at $302,550 -- and that
is without many of the added features that are increasingly becoming
standard, such as Java applets, cookies, transaction capabilities,
dynamic page delivery, a robust search engine, or chat forums (Net
Marketing, 1997.) Although still only a fraction of the overall
advertising and marketing budgets of most advertisers (Jupiter Communications
AdSpend, 1997), developing an online presence already demands significant
investment. Furthermore, there is evidence that a poorly conceptualized
and executed website may do more to harm than help established brand
equity.
These
are reasons enough to consider ways in which the research industry
may cooperate in developing appropriate methods for testing website
concepts, content, features and functionality with the target audience(s)
of the site. New media agencies and website developers find they
are being asked to track the performance of their sites after launch,
well beyond the insights provided by traffic data, and increasingly,
are interested in insuring that their work matches audience expectations
prior to and during site development. Based on a large number of
accumulated international case studies in which the websites of
"Fortune 100" firms in various stages of development have been tested
with prospective visitors, this essay will discuss implications
for creating models and protocols for future website tests. Approaches
for Website Testing A wide range of methods have been applied to
the problem of testing websites. Perhaps because many agencies specializing
in interactive media have strong ties to the software industry,
the "usability" testing method has been popular and widely used.
Usability testing is a means by which software developers watch
how users might interact with their product in a laboratory environment.
Users are given a set of tasks to accomplish and are asked to report
problems they encounter in the software as they work on their tasks.
Typically,
users are debriefed after the session and patterns of interaction
are examined across several such sessions. Sometimes usability tests
are conducted in groups, with pairs of peers or friends assigned
to a workstation and commenting on the software in tandem. Despite
the widespread application of "Usability" studies in the software
industry, there appear to be many problems with extending the methodology
of the Usability approach to the WorldWide Web. First, there are
a number of distinctive differences between a software package and
a website or websites. People generally use a given software package
to accomplish a limited number of fixed tasks, such as word-processing,
statistical analysis, or image manipulation.
The
expectations associated with a particular software package tend
to cluster around a predictable set of functions and interface issues.
However, WWW users come to the Web with almost unlimited objectives.
Each user's interaction with a site warrants an independent investigation
of the situational expectations attached to that interaction. For
example, "usability of a site depends on what users are trying to
accomplish. Are they surfing? Doing research? Buying products? Downloading
software?" (Spool, 1997) Furthermore, individual websites are constructed
with differing objectives in mind, which could include branding,
retailing, or public service, among many other potential goals.
This greatly stretches the "usability" methodology, to the point
that in a recent manual entitled "WebSite Usability: A Designer's
Guide" (Spool, 1997) the authors conclude that after having applied
usability testing methods to a number of websites, they admit in
fact "websites aren't like software....We assumed that websites
would be just another form of software and could be tested similarly,
but we were wrong." The authors suggest that the accepted rules
for testing software do not seem to apply in website evaluation,
and that few of their hypotheses worked in this new area. In short,
usability testing in its conventional form has many problems in
addressing the Web and further appears to be ignorant of the implications
of more than 50 years of social scientific and cultural-studies
based literature in communication and media research.
Usability
testing models, for example, are ill-equipped to consider the influence
of competing advertising within a given environment nor is it able
to take into account the evolving editorial matter which forms the
basis of most sites. Moreover, the experimental laboratory method
used is not ideal for evaluating audience/user behavior, because
it severely decontextualizes user behavior, which has long been
recognized by audience researchers as a problem in evaluating the
interaction between media and audiences. Therefore other methods
of evaluating user-website interaction which allow this interaction
to take place in its naturalistic setting appear to have more promise.
Websites are also being tested in one or more of these ways: · Surveys
are posted on the website and all users are invited to fill out
the survey. An elaboration of this approach includes the server
selecting every nth visitor to the site and being served a "pop-up"
invitation to fill out a survey (which can link to an internal form
or take the visitor offsite to a vendor's server, then return the
visitor to the original page request.) Occasionally, surveys will
be sent out to a registered user base or even randomly, in the hope
that the person on the other end of the e-mail address may have
visited the site in question at some time in the past. Perhaps the
most significant problem in using surveys for website evaluation
is that a comprehensive evaluation of a site seems to be more appropriate
for qualitative approaches, in that the exploratory nature of the
web and the variety of objectives associated with its use cannot
easily be explained by imposing categories on it from above. Lacking
externally and internally valid models to predict web user behavior
at this early stage of the medium, it is difficult to construct
adequate survey instruments.
There
is also the difficulty that survey responses obtained in this manner
tend to be polarized in nature -- only the most delighted visitors
or the most dissatisfied seem to respond, thus skewing the data.
· Alternatively, some companies have been developing ways of performing
qualitative website evaluations in an online environment. Initially
these tended to take place in chat-room environments such as on
the IRC (Internet Relay Chat) system, MUDS (Multi-user Dungeons)
or MOO's (Multi-User Dungeon, Object Oriented) but because these
environments have little or no ability to include graphic images
they lacked the synergism of evaluating the site in a structured
fashion while looking at it in real-time. These text-based areas
of the Internet also frequently had security problems. For this
reason, firms like my own have focused on developing secure interviewing
environments that are web-based so that discussion with one or more
visitors may take place along with full multimedia immersion (website,
webpages, audio, video, photos, etc.) Other advantages of web-based
interviewing environments include the ability to observe or participate
in an interview from wherever web access is available, now quite
widespread. In addition, respondents are experiencing a website
as they do "naturally" -- on their own equipment, at their own pace,
and with the bandwidth they would normally use. This decreases the
chance of introducing bias due to superior equipment or connection
speed that a centralized "laboratory" test environment might enjoy.
Sampling for Website Evaluations As the author (Harris, 1996, 1997,
1998) and other researchers have noted, sampling for online research
is one of the greatest challenges faced in moving the field forward.
Because no master database of online users exists (or is likely
to be available in the foreseeable future), probability or true
random sampling designs cannot be realistically achieved. In addition
to this limitation, Internet users are notoriously intolerant of
unsolicited communications (known as "spam") and so invitations
to participate in surveys or other research projects, even by credible
and well-known research firms, are often met with anger or even,
various forms of unpleasant retribution.
Research
firms which randomly solicit participation by "broadcast" e-mail
are in danger of having their Internet Access pulled by their service
providers, who have agreed to a zero-tolerance policy on spam. This
has led most research organizations who practice online research
to form internal panels, which can be screened, validated and sampled
on demand. Much work needs to be done to advance our knowledge of
panel management for this special population, which is quite transient
and which is also quite capable of cloaking or disguising identity
in ways perhaps unavailable to members of "traditional" mail or
telephone panels. We do not yet know what constitutes panel maturation
effects for online panels, or even what the optimum "black-out"
period might be to avoid panel wear-out for members. It is clear
that online panels can be very large and capable of very fine segmentation
with the right tools. The fully international scope of online panels
and the ease of access to members via e-mail coupled with the low
cost of maintenance makes for a powerful equation. Recruiting panel
members has been accomplished in a number of ways: some are rather
costly: buying lists (which still puts one in danger of spamming),
inviting participation through a phone or mail contact, buying online
banner presence that invites clickthrough to a screener, or simply
registering one's site with various search engines/directories and
hoping for traffic. Several major research companies in the U.S.,
such as Simmons Market Research Bureau (SMRB) and National Family
Opinion (NFO) have benefitted by being able to identify subsets
of their large consumer panels who report online usage. These subset
panels have the advantage of already being well-screened and with
a wealth of household data attached to them, but it is unclear whether
or not there are intervening variables associated with pre-existing
membership in a panel which may make these panelists less representative
of the online population.
Generally,
we have found quota and stratified sampling schemes to be reasonably
robust for online applications. Although nth selection sampling
as a way of sampling website visitor traffic is promising, it must
also be used with caution. This is so for several reasons. First,
the software used for nth selection is typically based on CGI- code
with a Javascript pop-up at various points in the intercept. Some
web-browsers interact improperly with Javascript protocols or have
problems with CGI calls. More research also needs to be done to
determine response-bias effects. For example, what differentiates
the "refusals" from the "cooperatives" in a website intercept attempt?
Some respondents have reported being irritated by the obtrusiveness
of the intercept device or indeed, see it as a violation of their
privacy. As online users are highly sensitive to privacy issues
and are fearful of the ways in which computers can track their behavior
without their knowledge, it is reasonable that response effects
related to the intercept method of sampling could be significant.
Toward a Taxonomy of Website Elements Even a casual web user is
aware of the many different elements which are now deployed in website
design. These include advertising content delivered in many ways
(animated or "flat" banners, with animated banners being far more
common than the now outdated "flat" banner, interstitial ads, dynamically
served ads, ads in multiple page positions, "keyword" ads, etc.),
the "editorial" content of a site, graphics and other multimedia
elements (such as audio, video, or animation delivered by software
such as Shockwave and Flash), and page navigation strategies. Additional
features might include avatar- or text-based chat communities, e-mail,
and shopping baskets or other aids to online commerce.
Because
it is commonly accepted that users exhibit very purposeful behavior
when visiting sites -- they want to find what they were seeking
and as quickly as possible -- and often do not go beyond the "splash"
or home-page in deciding whether or not the site will be productive,
designers try to load up the splash-page with as much information
and as many features as possible to entice visitors to stay and
explore. However, this practice has resulted in very cluttered design
solutions in many cases. Developing a comprehensive taxonomy of
elements to be evaluated in a test is a difficult and ever-evolving
task. In our evaluation protocols, we first parse each site's elements
by asking several questions as we analyze a site internally. For
example: · what are the elements, particularly in the splash page,
that are most likely to contribute to a user's perception of a consistent
brand identity or image? For example, are there patterns in the
color palette or graphics utilized which could be isolated for analysis?
· What strategies are available to encourage the visitor to remain
in the site as long as possible? What elements in the page layouts
influence navigation choices (and possibilities?) Where should we
be most alert for opportunities for the visitor to leave prematurely?
When we interact with users during the test sessions, we ask many
questions about why visitors exhibit each and every navigation behavior
that we see.
We
sometimes create a content analytic scheme for comparative studies,
and are careful to apply the same categorical definitions across
pages or across sites. This is helpful in making certain that critical
elements are examined in our discussions with visitors. Other approaches
which have been used for website evaluation include subjecting users
to a limited battery of uses-and-gratifications oriented scales
(Eighmey, 1997) and to measurements which examine behavior based
on the variables of time spent on a page and number of page requests
per visit, when factors such as page background color, image size,
use of javascripts, presence of frames, and celebrity endorsements
are manipulated (Dreze, 1997.) While these approaches do not seem
well-suited to reflecting the comprehensive nature of a user's web
experiences, they do shed light on some of the many factors which
may influence the perceived quality of those experiences. Interviewing
Techniques In recent months we have experimented widely with using
Projective Interviewing Techniques within a range of website evaluation
tests. Projective interviewing is commonly used in qualitative research,
and has proven valuable in identifying underlying associations with
brands, products, and concepts. The difficulty with applying projective
tests online appears to be the lack of nonverbal supportive data
(for example, the respondent's facial expression, intonation, or
body language), which can contribute greatly to an overall understanding
of the cultural and social assumptions attached to a verbalization.
We are currently studying ways in which we can allow respondents
to re-introduce this nonverbal data within a virtual environment
that is predominantly text-based.
We
do this by providing devices such as an extensive customizable library
of emotive icons, vocabulary clusters which express various postures
and other reactions (for example, the respondent may select the
descriptors "nods enthusiastically" or "grins wickedly" from the
library as an attachment to a typed verbalization.) This encourages
participants to stay in touch with their physical and emotional
responses as part of their overall contribution to the discussion
and to continually include these in their remarks. We have also
had good results with associative games and exercises in which respondents
are asked to ascribe personality attributes to a website, brand,
or representation of a concept. The standard focus group technique
of asking participants to think of the product as a person, and
to fully describe that person ("what gender would this person be?
What would he/she look like? What kind of work would he/she do?
What would he/she wear? Drive? Think about?") has worked well for
us in online environments, once the nonverbal expressive strategies
are added to the mix. In fact, we have found consistently that online
respondents are less subject to the peer-pressure effect so frequently
observed in focus groups around projective exercises in particular,
probably due to the influence of their apparent anonymity. We acquire
extensive and very detailed data in this way which goes beyond what
we might expect in "offline" groups and without the over-emphasis
on consensus building which seems to be a feature of group discussions.
Projective exercises across an accumulated series of groups or depth
interviews, with 25 data points or more, produce patterns which
are easily recognizable but also enough variance to be certain of
the thoroughness of the interviewing. Incentivization Ensuring reliable
cooperation rates in online studies has been an ongoing concern
for researchers. While participants in qualitative studies that
demand a substantial investment of time require per-respondent incentive
fees, just as conventionally executed focus groups and depth interviews
do, we have found that the "magic number" for compensating online
respondents tend to be lower than in offline groups. For example,
a face-to-face focus group in a major urban center in North America
which requires an "average" demographic profile may run as much
as $50-$80 per respondent in incentive fees -- much more for a specialized
or rarified demographic. Online, cooperation rates are acceptably
high with incentive fees of between $25-$40, which represents a
considerable savings in overall costs per project. This may be because
the time commitment for online groups is less (typically one hour
as opposed to two in "offline" groups) and there is no travel time,
as respondents may participate from anywhere they have Internet
access. For online survey research, we have observed no appreciable
difference in cooperation rates between per-respondent incentivization
plans and more generalized strategies such as awarding a prize by
a random drawing, then announcing the winner. Because the cost and
administration burden of fulfilling incentive fees for a large survey
sample could be very steep, we have found positioning incentives
as "awards" -- as long as the reward is perceived as a valuable
one for the population in question -- to be a powerful means of
incentivization. Appropriate Stages for Website Testing Websites
may benefit from systematic evaluation at multiple stages of their
development. Of course, the protocols utilized must be adjusted
depending on the stage of development at the time of the test, but
there should be awareness of the importance of gathering data capable
of being analyzed longitudinally. Stages which are appropriate for
testing include: · Concept stage; Themes, ideas and topics can be
tested with the target audience(s) in the same environment in which
the finished product will be encountered · Storyboard stage; content/copy
can be roughed in at this point or the layout/graphic design and
planned features can be targeted for test. · "Beta" stages; A website
goes through a number of iterations as features and functions are
laid in; editorial content usually continues to change as the beta
evolves. · Pre-launch. A finalized beta, ready for launch but not
yet public. · At specified points within the first year of launch,
or in response to observed problems, such as underperformance of
goals (traffic, sales, ad clickthrough, etc.) · Pre-determined re-design
periods. Cross-methodological comparisons A great deal of work remains
to be done in looking at the influence of the interviewing environment
as an independent variable. While this work is well-advanced in
making comparisons between such techniques as mail, panel, and telephone
surveys, little is yet known about how the feature of interviewing
online may impact respondent-interviewer communication behaviors
and strategies. A few side-by-side studies are being reported now,
such as Alecia Helton's analysis of Texas Instruments experience
in conducting online and telephone surveys. We are currently analyzing
data from a recent series of online focus groups done for our client
HBO during the same week that they performed a set of "offline"
focus groups on the same topic, with the same recruitment criteria.
Early results suggest that there are significant differences in
the expression of responses, but few appreciable differences in
the conclusions. The training and expertise of the interviewer in
online studies is probably a critical factor in this equation, and
as there are few "experts" in online interviewing and no available
formal training, it remains to be seen how methodological comparisons
may be stabilized across practitioners so that reasonable conclusions
may be drawn from this data. We plan to release our results from
this qualitative cross-methodological comparison this spring.
CONCLUSIONS/SUMMARY
Within
the past 12 months, the discipline of online interviewing, and particularly,
practices associated with website evaluation, have come a long way.
The challenges in further advancing the field have become clearer,
although much work needs to be done in better understanding such
aspects as sampling, managing panels, and applying accepted models
from the history of marketing and advertising research to an online
environment. Researchers interested in participating in online research
must be willing to stay constantly alert to changes in not only
the online user population but the cycles of new media industries
-- as the business models evolve, so will the relevant criteria
for doing effective website evaluation. Additionally, there will
be increased pressure to invest heavily and continually in new hardware
and software that improves online interviewing performance. Around
the corner, for example, could be multi-point videoconferencing
systems that bring respondents and interviewers together in a true
virtual space. While at first glance this relieves us of struggling
with such issues as the artificial introduction of nonverbal data,
it also suggests a range of new problems in adapting our research
models to fit the mediation of advanced technological systems. For
researchers willing to accept the challenge of continual change,
this vision of the future appears utopian. In a rapidly evolving
environment such as that the Internet brings to a researcher's agenda,
change and demand for adaptation appear to be the only sure bets.
References
Dreze,
Z. and Zufryden, F. (1997). Testing Website Design and Promotional
Content, Journal of Advertising Research. March/April, p. 77-91.
Eighmey,
J. (1997). Profiling User Responses to Commercial Websites. Journal
of Advertising Research, May/June, p.59-66.
Forrester Research, Inc. (1997). Marketing on the WWW Conference,
New York City, May 21.
Gaal,
O. (1997). WebTrack AdSpend: A Monthly Data Report. Jupiter Communications.
Harris,
C. (1996). An Internet Education: A Guide to Doing Research on the
Internet. Wadsworth/ITP.
Harris, C. (1997). Theorizing Interactivity: Models and Cases in
Online Research. Marketing and Research Today, ESOMAR, November.
Harris,
C. (1998, forthcoming). Strategic Interactive Communications. NetMarketing,
July 1997. (Http://www.netb2b.com/cgi-bin/cgi_wpi_archive/wpi/96/09/01/article.1)
Spool,
J. (1997) WebSite Usability: A Designer's Guide, User Interface
Engineering.
About
the Author
Cheryl Harris, Ph.D., is an experienced e-business executive
and entrepreneur, as well as a respected educator. A former
professor at California State University and Parsons School
of Design, New York, a published author and frequent international
public speaker, she is well-known as one of the leaders in user
experience and usability research. In 1996 she founded Northstar
Interactive, an online research and consulting firm, and led
the firm to its successful acquisition in February, 2000. Northstar
developed web-based software and usability tools and consulted
on strategy + design issues for such clients as Procter & Gamble,
Motorola, Sprint, IBM, Netscape, Sony, AT&T, Time Warner, Roadrunner,
Ogilvy & Mather, Grey, Modemmedia, Monsterboard, Mastercard,
Citibank, eBay, Office.com, Insweb, Ziff Davis, Conde Nast,
NBC, HBO, Discovery, and CNBC. She was also SVP, Interactive
Strategy at Datek Online where her redesign of the online brokerage's
site resulted in a doubling of customer accounts in less than
four months. The new site was recognized or received top awards
from Money magazine, TheStreet.com, Gomez Advisors, PC Computing,
Red Herring, and several others. She is on the boards of several
institutions, including the Lower Manhattan Cultural Council,
the University of Massachussets IT initiative, WNET reelnewyork,
and is a juror for several digital media festivals. Her publications
include three books: An Internet Education (International Thomson
Press, 1996) Theorizing Fandom (Hampton Press, 1998) as well
as numerous articles. She received her Ph.D. from the University
of Massachusetts-Amherst in 1992.
|
|
 |