This is the start of the webfilter project on SourceForge. Please feel free to contact me (efm at sourceforge.net) if you would like to be added to the project developers.

Webfilter SourceForge Projects Page

Please sign up for the webfilter-discuss mailing list

  Design for a Web Filtering Service

  Phil Agre
  http://dlis.gseis.ucla.edu/pagre/

  Version of 4 November 2001.
  2600 words.


For a few years now I've been using this mailing list to filter the
Web.  I ask people to send me URL's on certain topics.  Then I look
at the pages, keep the good ones, and assemble packages of URL's with
titles and short commentaries.  The results are sent to the mailing
list, kept in a Web-based archive of the list, and included in a
single big Web page of all the URL's I've sent out -- the latter
mostly to help me avoid duplicates.  I have sent out many thousands
of URL's under this system: about 1500 on the recent US elections,
about 4000 about the current war, and perhaps 6000 on other topics.
Complete lists of the URL's can be found through the RRE home page:

  http://dlis.gseis.ucla.edu/people/pagre/rre.html

These 10,000+ URL's have been submitted by hundreds of people, and
the great strength of Web filtering is that it draws on the diversity
of the participants.  I would never have heard of most of those URL's
on my own.

Community Web filtering seems like a good idea, and it's time to
explore automated tools to support it.  In this article I will suggest
a design for a Web-based filtering tool.  I cannot participate in
building such a tool, but I would be happy to try out any prototypes
that others might construct.  I have established a discussion list for
people who might be interested in working on a tool:

  http://groups.yahoo.com/group/webfilter

This list is open and unmoderated.  If something more sophisticated
is needed, I am hoping that people who join the list can self-organize.
I believe that such a tool would be an excellent public service, not
just for this list but for many others as well, and I hope that some
public-spirited subscribers will be interested in taking initiative
to build it.  I would be happy to publicize their contributions, or
else help them preserve their anonymity, whichever they would prefer.

Here, then, is my proposed design.  I am sure that people who design
Web-based services for a living can do better, but I also hope that
any designers will listen to my rationales, which are based on years
of experience running a community Web filtering service by hand.

The "webfilter", as I'll call it, is a cross between a discussion
list, a weblog, and a bookmark file.  It is not just a weblog, since
it includes numerous functionalities to deal with long lists of URL's.
Nor is it just a discussion list, since the goal is to produce a
reasonably clean and orderly presentation of the URL's.  Nor is it
just a bookmark file, because of its community nature.

The webfilter lives on the Web.  The key idea is to require people who
submit URL's to impose a minimal degree of order on their submissions.
Right now, I get free-form text messages from submitters, and I have
to fish through these messages by hand to recover a useable URL.  Even
though most submitters are well-intentioned, it takes a lot of work
to process these text messages.  The messages are so diverse that it
would be impossible to write an automated tool to parse them.  That's
why we need a more structured tool on the Web.  The tool should be
friendly, simple, and efficient, but I don't think that will be hard.

The webfilter code should be open-source -- I'm not interested in a
proprietary system.  Each site that runs the webfilter code will be
called a "webfilter" (by analogy with "weblog").  Each webfilter will
provide one or more "services".  For example, on my own list I have
effectively been providing three services: the election, the war,
and miscellaneous (mostly politics and social aspects of technology).
Lots of people would like me to separate the political service from
the non-political services, and the webfilter could support many
such divisions.

Services come in two types: edited and unedited (by analogy with
moderated and unmoderated mailing lists).  Each service has an owner,
and some services have editors.  The owner and editor need not be the
same person.  Each will have a password.

Each webfilter service on a given site has five modes: Submit, Edit,
Revise, Configure, and View.  I'll explain mode each in turn, stopping
to explain the reasons for each design choice.

(1) Submit mode

The owner creates the service using the Configure mode and then
advertises a Web page where people can submit URL's to it.  No login
is required.  The Submit page is very simple.  It has three boxes, a
menu, and a button:

 * URL - The box should be large enough to handle the cumbersome,
   multiline URL's generated by some online publications, such as
   the Globe and Mail.  Required.

 * Title - We should encourage people to extract the real title of
   the material on the page and insert it in this box.  The title
   of the Web page itself may not be useful.  Optional.

 * Commentary - Space for several lines of commentary.  I typically
   provide one line of commentary at most, but many people prefer
   to include several lines, and many readers prefer more clues about
   whether they should click on a link.  Optional.

 * Category - Large lists of URL's are overwhelming unless they are
   broken down into categories.  Imposing the categories takes work,
   and we should shift most of this work onto the submitting users.
   The list owner should establish the categories, and should have
   an interface for editing them.  The categories should then appear
   to submitters as a (potentially two-level) pop-up menu.  This
   is optional, simply because the default category will be "other".
   Still, submitters should be politely encouraged to provide a more
   specific category.

 * A button called "submit".

The "submit" button should rapidly bring up a confirmation page.
The confirmation page will include the submitted page (i.e., the page
whose URL has been entered) in a frame, and above the frame the URL,
title, and commentary should be neatly presented.  If the same URL has
already been published in the same service, then an appropriate notice
should apepar, together with a link to the archive entry for that
issue.  (Users should not be prevented from resubmitting URL's that
have already been published on the service, but the editor should
have a configuration option to automatically throw such submissions in
the trash.)  The confirmation page should have two buttons: "confirm"
and "edit".  The "confirm" button sends the page to the input queue,
says "thank you", and gives the user a blank Submit page.  The "edit"
button returns to the Submit page with the user's URL, title, and
comment.

We need a confirmation page because people often accidentally submit
URL's that don't work.  The counterargument is that the confirmation
protocol imposes overhead on the majority of users that outweighs the
hassle of badly formed submissions.

When a page is sent to the input queue for a service, the webfilter
should check for duplications.  If several people have submitted the
same URL with different commentaries then the webfilter needs to do
something reasonable, which I'll discuss under Edit mode.

(2) Edit mode

Webfilter services, as I say, can be either edited or unedited.
Once it has been set up, an unedited service is entirely automated.
An edited service has an editor, who is a user with a login name
and a password.  (In principle it could have several editors, but
I will keep it simple.)  Each service has a separate URL for the
editor; this URL is presumably not advertised, though no harm would
result if it was.  (This is better than having a single login page
for everyone, which makes the editor enter the name of the service
every time and clutters the interface for non-editors.  Both types
of users need to be able to bookmark the page for their respective
mode for a given service.)

Having logged in, the editor is offered a link to Configure mode,
explained in a moment.  But in most cases the goal is editing, and it
is crucial for the Edit interface to be efficient.  The editor's job
is to rake through the URL's that have been submitted by subscribers
to the service.  The editor should presumably be shown a list of
these URL's, perhaps just the titles with hyperlinks, together with
statistics about how many URL's are in the queue, when the last batch
of URL's was released, the total number of URL's published to date,
and so on.  (The latter item is purely for the editor's curiosity.)

The editor's goal is to assemble an "issue" of the webfilter service.
Each issue consists of a title (i.e., the Subject: line of an email),
a prefatory text, lists of URL's under successive categories with
titles and commentaries, and perhaps a concluding text.  The editor
can work on an issue incrementally, and need not publish it until it
is ready.  So the service will always have a partially assembled issue
stored in its database.  The editor can only work on a single issue
of a given service at a time.  If the editor wants to fork off several
issues, that probably means it's time to break the service into
several services with distinct identities.

The crucial part comes when the editor settles down to filtering the
submitted pages.  Having clicked a button in the Edit mode called
"filter", the editor should be presented with a series of framed
pages.  One frame (at the bottom) will have the submitted page,
and the other (at the top) will have the URL, title, and commentary
(all in their own boxes) and the categorization (with the same pop-up
menu that the submitters see).  Four buttons are also provided on
the top frame: "accept" (include this page in the issue), "reject"
(throw this page in the trash), "postpone" (hold this page back for
potential inclusion later on, either in this issue or the next), or
"stop" (make no decision on this page, and instead return to the main
Edit mode page).

The general idea is that the editor can change any of the entries
that the submitting user has made.  The editor can change the URL
(for example, removing the junk after the "?"), the title (for example,
substituting a paraphrase or a descriptive rant for a title that
may not be self-explanatory), the commentary (for example, deleting
the commentary altogether or editing it down to something simpler),
and the categorization (for example, supplying a category when the
submitting user has kept the default).  A particular problem arises,
as I mentioned, when several users have submitted the same URL with
different commentaries (or different titles, for that matter).  I
don't know what the right answer is here.  Perhaps the editor should
simply get all of the commentaries (or titles) in the single box, and
should delete or edit the whole lot of them at will.

Once the editor hits an "accept", "reject", or "postpone" button, it
is crucial that a new framed page appear as quickly as possible.  It
should be possible for an editor to crank through dozens of submitted
pages every day, making rapid decisions on each one.  Once the queue
has been exhausted or the editor has hit the "stop" button, the Edit
mode page should come back.  In addition to the features that I've
mentioned, that page should also have tools for publishing an issue
consisting of the URL's that have already been accepted.  I would
suggest that the editor be able to supply (in the Configure mode) a
"boilerplate" text that goes at the top of each issue, and then before
publishing should have a chance to edit that boilerplate text, for
example by adding extra comments.  (Perhaps the interface for editing
the boilerplace text should go in the Revise mode.)

The Edit mode page should have, inter alia, a "revise" button and
a "publish" button.  I'll explain the Revise mode in a moment.

When an issue is published, several things should happen.  An email
version of the issue should be sent to everyone who has subscribed
to the service, the URL's in the issue should be included in one giant
historical file for the service (for quick reference as to what has
already been published), and the URL's should also be entered into a
database, indexed by the category that they have been published under.
This database is what the View mode looks at.  Finally, the editor
should be returned to the Edit page, which should clearly reflect
the successful publishing of the new issue, along with whatever URL's
still remain in the queue.

(3) Revise

When "revise" is pressed in Edit mode, the editor is taken to a page
that resembles in spirit a Web browser's bookmark editor.  This is
probably the most difficult mode to program.  A mock-up of the issue
should appear, and the editor should have point-and-click commands
to rearrange the links, delete them, edit the titles and commentaries
again, and so on.  The mock-up should gather the URL's under each
category in the order they were submitted, and the categories
themselves should appear in the same order as they do in the pop-up
menu.  The order of the URL's within each category of the issue
is quite important, and the editor's tools for moving URL's around
within a category (or, I suppose, between categories) should afford
rapid wholesale rearrangement.  It is possible to make this interface
infinitely complex, for example by providing features to break
categories down into new subcategories or whatever, but this sort
of thing is not crucial.  In running webfilter services by hand, I
have sometimes used standardized categories and sometimes allowed
categories to emerge spontaneously within each issue.  But it is not
crucial to be able to improvise new categories, so long as it is easy
to edit the category menu in the Configure mode.

Incremental changes in the Revise page should be permanently stored
in the mocked-up issue.  The Revise page should have a "return" button
that returns to the Edit mode page without publishing the issue, as
well as a "publish" button.

(4) Configure mode

Along the way, I have listed some of the configuration options that
the editor should have.  It is easy to multiply such options, and you
can probably imagine them as well as I can.  It would be nice for each
service to have its own visual identity, selected from some options.
The Configure mode should presumably be used when a webfilter service
is first established, but it should hopefully not be needed very
often afterward.  I would guess that the most commonly used Configure
function will be adding new categories to the category menu.

(5) View mode

The owner also publicizes a URL for the service's View mode.  (For the
sake of cleanliness this should be a different page from the service's
Submit mode, though obviously the two pages should be linked to each
other.)  If you wanted to get fancy then you could offer users the
option of logging in and configuring the way they view a service.
But for most purposes a simple interface should suffice: links to
the recent issues in reverse chronological order.  It would also be
nice to offer links to reverse chronological lists of the links under
particular categories: the user would select these lists using the
same pop-up menu that they use to submit links.  Each list within a
given category would, of course, consist of sequential entries: title,
commentary, and URL.  The URL should be hyperlinked to the page it
names.  It would be nice if the server could periodically check all of
the pages to make sure they still exist, labeling links that have gone
bad so as to save users the trouble of trying to follow them.

The View mode page should also have a simple interface for subscribing
to the service by e-mail.  Just type in your address and hit the
"subscribe" button.  The server should obtain a confirmation by return
e-mail before adding the address to the list.

That's it.