Published in ACM interactions magazine, 1999, v.6, n.2, p.32-35.
PDF
Gary Perlman
director@hcibib.org
http://www.acm.org/perlman/
The idea for a free-access online bibliography on human-computer interaction,
which resulted in the HCI Bibliography, is over ten years old.
Although it started slowly, the HCI Bibliography
has grown to over 18,000 entries, most with abstracts,
with over 4000 links to full text.
Now, with its own web site at
www.hcibib.org
and its own search service,
the HCI Bibliography serves as a central repository
for HCI information on the Web
(with entries for about 800 Internet resources)
and off
(with entries for about 400 books over 15 conferences
and over 10 major journals).
This article (1) summarizes the history of the HCI Bibliography,
(2) describes its current holdings, web site, and search service,
and (3) considers how and why to offer a free service.
Human-computer interaction,
Bibliography,
Bibliographic information retrieval,
World-Wide Web,
Full text retrieval,
Expert system,
Search assistance,
Query language usability,
Compulsive gathering of information
The HCI Bibliography grew out of my 1988 experiences while
I wrote a curriculum module on User Interface Development
for the Software Engineering Institute
(Perlman, 1989).
I was delighted to have work-study students type in
the bibliographic information for about 200 references;
I especially liked having the abstracts and/or table of
contents also online.
I could search the file, reorder records, import them
into bibliography management tools, etc.
I thought, "If a couple of students can put online hundreds of
records in a few weeks, what could hundreds or thousands of people
do? In a short time, their efforts could be merged into a bibliography
to be used by thousands. Maybe authors could donate records
of their own publications!
Of course, that was a naive view. Authors wrote long
messages explaining why they could or would not provide
an abstracted entry. People donated bibliographic data
that had several errors per abstract. The SIGCHI EC
expressed a lack of interest in helping because of the likelyhood
of failure of such a project. Fortunately, work-study
students at The Ohio State University signed up to do data entry,
and some people on the Net (as we called the Internet then)
were willing to validate entries. Also, publishers were willing
to give permissions to have their materials online, free of charge.
In 1991, after two years of getting started, the first paper
on the HCI Bibliography (or HCIBIB, as I call it), appeared in
the SIGCHI Bulletin (Perlman, 1991), boasting of over 1000 entries
and promising "Eventually, all of HCI will be
online and freely accessible around the world."
Once started, the SIGCHI EC became a consistent supporter
and sponsor of the project.
During the formation of the HCI Bibliography,
other projects "competed".
There was a book from the ACM Press (ACM, 1990)
with a general title:
Resources in Human-Computer Interaction,
but which was actually a printout of a query done on ACM publications.
Although it had several indexes, I could not help but think
that any printed index would be a relic of outdated ideas.
Around the same time, I received in the mail a printed
bibliography on hypertext, ordered by author and nicely bound as a report.
I could not help but think how ironic it was that a bibliography
on hypertext in particular would be (1) on paper and (2) in one organization
(and one that was least useful, except perhaps for the authors).
I was convinced that online information was the only long-term option,
and that a format that identified the different parts of entries would
allow a variety of search and display options.
The UNIX Refer format was chosen because it was simple enough
to explain to non-experts.
Another project, HILITES (Shackel, et al, 1992), had broader coverage
and more features, but was costly to maintain and therefore costly to provide.
I surveyed several hundred "registered" HCIBIB users and concluded that
HILITES was beyond the financial reach of most people in HCI,
and that by being provided as a CD-ROM, did not serve the needs of many.
To simplify coverage of the major sources of HCI publications,
an early decision was to cover whole journal volumes and conference proceedings
that were substantially if not primarily on HCI.
Each of these modules was kept in one file
(or several files with related names for large conferences).
During the early 1990s, the backlog of modules
proceedings were added to the HCIBIB database,
going as far back as the first volume (1969) of the
International Journal of Man-Machine Studies
(renamed International Journal of Human-Computer Studies in 1994).
In more recent years, OCR scanning of entries has proven
more accurate than volunteers typing,
especially when supplemented by hundreds of automated checks.
Although recognized as the primary source of HCI bibliographic information,
the HCIBIB was a database and not a search service.
A variety of search services via email and later via the Web were provided,
and these appeared to be very popular, if judged only on the
number of requests I received about these unaffiliated services.
None of these services were authorized, and they typically
fell behind in their coverage, in some cases having less than half
the released records.
In 1997, the HCIBIB moved to its own domain, hcibib.org,
offering Web and FTP access.
In April, 1998, its search service started.
As of September, 1998, the HCI Bibliography:
- covers over 15 major HCI conferences
(www.hcibib.org/confer.html)
- covers over 10 major HCI journals
(www.hcibib.org/journal.html)
- has over 18,100 entries (about 400 files, almost 19 Mbytes) including:
- about 400 entries on books,
many with tables of contents;
- over 4000 links to full text online,
most requiring a subscription;
- about 800 entries on internet resources,
categorized to create the
link indexes on the SIGCHI Web site:
and the link index for ACM SIGACCESS (formerly SIGCAPH):
all these pages have links to forms to suggest new resources.
It has been a principle of the HCI Bibliography
that currency of coverage is not as important as affordability,
correctness, portability, etc.
Once online in a portable format,
materials online will remain online indefinitely.
The HCI Bibliography usually has had a backlog of materials to bring online,
and once online, a backlog of materials to validate.
As of September, 1998, both backlogs are relatively low.
The HCI Bibliography web site is at:
http://www.hcibib.org/
The HCIBIB web site was redesigned in April of 1998
and since then until September of 1998
has had over 17,500 visitors (over 100 per day).
There are several pages on how the project is run
(e.g., publisher permissions, data collection and validation, support)
and pages about what's new in the database and the search service.
There are pages to browse the collection
by publication type, publication date, when released, etc.
There is a list of the most frequent authors (those with 10 or more
authored entries in the HCI Bibliography), with links to
retrieve all the publications by each author.
The HCIBIB search service started in April of 1998
and between then and September 1998 has processed
over 30,000 searches (about 6000/month).
Monthly counts show the service growing from an initial 150/day
to about 300/day during that period.
On September 3, 1998, it handled over 1000 searches
for the first time.
The HCI Bibliography search service is based on the glimpse search engine
(glimpse.cs.arizona.edu),
a free-for-non-profit tool that runs on the server for HCIBIB.ORG.
Ironically, the search system for the HCI Bibliography has serious
usability problems.
Compound those with the generally unplanned nature
of searches for a free web service, and it is clear (from the server logs)
that many searches miss a lot of what is desired.
The search service provides extensive advice about how to improve
a search, using knowledge of:
- commonly misspelled author names in HCI
- common (non-discriminating) terms in HCI
- differences between British and American spelling
- methods to broaden or narrow a search
and by providing
relative search term frequencies in the database
and
frequencies of terms in results.
The best searches for various topics
are maintained in the HCIBIB database as internet resources.
These can be modified with terms to further restrict
a search.
For example, the following finds over 1400 records on hypertext OR hypermedia:
{hypertext,hypermedia}
(comma means OR, and braces imply grouping).
The search could be modified to find over 60 records on books:
{hypertext,hypermedia};isbn
(semi-colon means AND)
or almost 500 records with links to full text:
{hypertext,hypermedia};http
There are search options to control whether the search is case-sensitive,
or whether whole words must be matched.
There is an option for an approximate match
that will allow one error per search term.
These options can have large effects: a whole word search for AI
gets 100 records; within-word, it matches most of the records
in the database.
Results can be viewed in HTML or raw Refer format,
in brief or detailed views,
and search terms can be highlighted in the text.
Records contain bookmarks that are actually
links to search for a record's identifier;
these can be saved for future reuse.
Book numbers (ISBNs) are displayed as links to
amazon.com,
from which any royalties are donated to the Central Ohio
local ACM SIGCHI chapter,
BuckCHI.
I think Voltaire wrote, "The best is the enemy of the good."
So, irony aside, the glimpse search engine lets people search
the HCI Bibliography from the convenience of their browser.
Over time, a more usable front end may be provided for the queries.
It might make a good project for a user interface course.
I am occasionally asked how the HCI Bibliography Project
is managed (e.g., how to get permissions for materials,
how to get them online, how to validate them).
These procedures (and their motivations) are documented
(online, of course),
but they have often have had the effect of discouraging people
from creating a bibliography service for a field that could use it.
It turns out that high-quality bibliographic data is expensive,
at least 10 minutes per record when all is counted, but often more,
Given that there are too many publications for most people to browse,
good bibliographic records provide a reasonable point of access
(especially when titles, abstracts, and keywords are well done by authors,
which is unfortunately infrequent).
I'm often asked why I put hundreds of hours per year into
the HCI Bibliography.
Besides the obvious compulsive disorder the work satisfies,
I've found that I personally get a lot out of doing work that has
lasting value (once online, always online),
and which is used by other people.
It has also been a great source of data
(and now a platform)
for exploring uses of hypertext for doing research
(especially with a search service on the web).
More recently, it has been a reason for me to learn more
about how to provide web-based services,
which I can apply in other contexts.
And, occasionally, people thank you.
-
ACM Press (1990)
Resources in Human-Computer Interaction.
ACM: New York.
-
Perlman, G. (1989)
User Interface Development.
Curriculum Module CM-17.
Software Engineering Institute: Pittsburgh, PA.
-
Perlman, G. (1991)
"The HCI Bibliography Project".
SIGCHI Bulletin, 23:3, 15-20.
-
Shackel. et al (1992)
"HILITES -- The Information Service for the World HCI Community."
SIGCHI Bulletin, 24:3 40-49.