[Citizendium-l] Why we should fork all at once

Larry Sanger sanger-lists at citizendium.org
Fri Sep 29 01:22:02 EDT 2006


By now it must be obvious to everyone that the choice whether to start with
a blank wiki, or with a mirror of Wikipedia, or with something in between,
is a crucial choice; it will have deep and lasting impact on the shape of
the project.  If that isn't obvious to you now, I hope it will be by the
time you get to the end of this post.

We started planning out a Digital Universe Encyclopedia in about February of
last year.  For many months, the Digital Universe Encyclopedia was going to
be a piecemeal fork of Wikipedia.  But by December, we had changed our
minds, and it wasn't going to be a fork at all.  And now I have changed my
mind again, and now think that an all-at-once fork is desirable.  Having
lived on all sides of this issue myself, I can tell you that piecemeal
forking is not nearly as obvious a choice as some--Kim van der Linde has
been among the most outspoken here--might make it seem.  But let's see if I
can actually convince you of this.  Please hear me out.

A good way to begin an analysis of this issue is to explain what the
*operational* difference between piecemeal and all-at-once forking would be.
Haukur Þorgeirsson very valuably went into this by asking what the
consequences of all-at-once forking would be, but there is an even more
general question that's worth examining: how, operationally or procedurally,
would the different policies work differently?  I'm not sure we've given
that question enough thought.  So let's examine it.

======================================
How a piecemeal fork would get started 
======================================

Let's begin with the simplest case of piecemeal forking, leaving aside the
option of forking all subjects at once.  (No one has yet explained *how*
that would be achieved, but never mind.)  A piecemeal fork would start with
a blank wiki.  How would we begin?  Presumably, once the policy pages were
prepared, we would begin with some top-level, very general topics, such as
"matter" and "philosophy" and "literature," and people would copy those
articles over from Wikipedia, *or* simply start new articles from scratch.
(I assume that the Citizendium would follow Wikipedia's policy that new
pages should not be started unless they are linked-to from some
already-existing page.)

Suppose then that it's an hour after the launch and people have copied over
a dozen articles and are working on them.  Some of the articles *aren't*
copied from Wikipedia.  The articles have plenty of red links, which other
people can feel free to follow.  They are presented as blank wiki pages, and
people can then, if they want, try to find the corresponding Wikipedia
article.  Eventually, a tool might be written so that, if there is a
corresponding Wikipedia article, it is displayed and the author may then
import it with the click of a button.  There probably won't be such a tool
at launch.

Moreover, radical revision or replacement of articles will mean that huge
swathes of Wikipedia content might never be noticed--because the links from
the original articles are deleted.  There are ways of solving this problem,
however.  And this is going to be a problem whether we fork all at once or
piecemeal.  To ensure that there are no orphans, I suppose we'll have to
keep track of links that we delete at the bottom of the articles.  With the
all-at-once approach we can actually search for orphans, though.

=========================================
How an all-at-once fork would get started
=========================================

An all-at-once fork would not require that people begin with the most
general articles and "drill down" from there.  They could go immediately to
whatever topic struck their fancy, and work on the articles on that topic.

But then, as several people pointed out, this would cause the problem that
there would be little islands of quality in a sea of mediocrity, and no way
to know what people have worked on.  But, of course, this need not be so.
We can have people category-tag all the articles that they have worked on.
(More on this below.)  Then it will be possible to display all the articles
that have been changed, by displaying that category's page.  For CZ that
category page would be a source of endless fascination, no doubt.  (There
are, of course, other ways to solve the same problem, but this seems like a
fairly elegant way.)

Wouldn't there then be enormous amounts of work for CZ to do, in cleaning up
1.4 million articles?

============================================
How serious are we about revising Wikipedia?
============================================

Indeed, it seems that some of the arguments for piecemeal forking that
impress people are variants on the following: all-at-once forking would be a
huge amount of work.  It would be like cleaning out the Augean Stables.  It
would be really hard, requiring that we do things like check every image,
and it would be a big turn-off for a lot of potential editors.

I'd like to point out that this argument "proves too much," as philosophers
are fond of saying.  If it is a good argument, then what it shows is not
that we should fork in a piecemeal way, but that we should not fork *at
all*.  After all, just look at the above comparison of how piecemeal and
all-at-once forking would get started: insofar as there is *forking* going
on at all, it involves an enormous amount of drudgework, e.g., keeping
records of links that were deleted in a CZ version of an article so as not
to create orphans.  It doesn't matter whether it's done gradually or all at
once.

So, maybe you didn't realize it, but you might not be in favor of forking
Wikipedia at all.  If you take the view that we can occasionally make use of
Wikipedia articles, but otherwise we're starting from scratch, then I would
say that you *don't* favor forking Wikipedia.  As Darren Duncan put it, we
should use Wikipedia as a source, one source among many.  (I think Darren
does not mean a "source" in terms of a source one *cites*, but a source in
the sense of being a stream of content--and not a stream that results in the
overwhelming majority of content, either.)

But I stipulate--here's one of those lines I draw in the sand--that we *are*
forking Wikipedia.  You should probably bow out of the project now if you
are not interested in forking Wikipedia.  Again, in that case, I respect
your choice, and I would direct you to the Encyclopedia of Earth and the set
of encyclopedias that the Digital Universe is planning, which I also helped
plan out last year.  They *won't* be a fork of Wikipedia, unless they use a
lot of content that started with the Citizendium.  But you're going to have
to wait (I can't tell you how long) if you're not an expert in a field
connected to the study of the environment, since that's all that the
Encyclopedia of Earth covers right now.

But the very *purpose* of the Citizendium is to take the content of
Wikipedia and whip it into shape--and, in time, we can only hope, create a
truly reliable resource that is worthy of the description "encyclopedia."

===================
Why fork Wikipedia?
===================

While I do insist on forking, for proper perspective, it's pretty important
that we think about why to fork Wikipedia at all.

If you think about it for half a second, you'll see that it's at least a
pretty good idea.  As to me, I think it's such a fantastic idea that I
basically resigned from my job with the DUF in order to pursue it.
(*That's* not an argument, but it does indicates the power with which the
following argument strikes me!)

Wikipedia is a prime target for forking for a combination of reasons:

(1) There are boatloads of academics who know about Wikipedia now, are upset
about it because their students (and the general public, and worse, their
colleagues) are constantly getting misinformation from it, and they are
motivated to do something about it.  They're motivated not just because they
want their students to believe true things, but because they view themselves
as stewards of their own fields.  "This upstart amateur-written self-styled
encyclopedia has encroached on what the public believes about *my* area of
expertise!  How dare it!"  That's the thought in the back of their minds (in
the back of your minds, many of you, I think).  This is why *many* academics
and professionals over the last two or three years have written me saying
that they'd like to get something similar but better started, or how I can
help mediate their dispute with Wikipedians, and so forth.

(2) Wikipedia has 1.4 million not-too-bad articles (in English).  Somehow,
for some of you, this has become a reason *not to* fork.  I would like to
remind you that it is actually a really good reason *to* fork.  Why?
Because the very size of the resource means that there is something truly
substantial to begin with.  The sheer size of the thing will help us to get
large numbers of people involved in improving it (this is an argument I'll
elaborate more below), and the result will be an impressive piece of work
much more quickly than if we were to start over, from scratch.

(3) Its size, coupled with its not-that-terrible quality, also means that,
like it or not, Wikipedia is the only game in town.  If we were to try to
start over from scratch, as with "Scholarpedia," there would always be
academics sighing, "Well, that's a worthy effort, but Wikipedia is what
people are using.  Why can't we try to improve *that*?"

Thus the attractiveness of a fork to a motivated community, i.e., all of
you, and a bunch more people who haven't heard of us yet.  In short, there
are huge numbers of well-qualified, smart people who believe a viable way to
start something that can displace Wikipedia as the go-to source of
encyclopedic information is by improving Wikipedia itself.  In other words,
if you can't beat 'em, join 'em...sort of.  A fork is a handy way of joining
'em (their content) without *really* joining 'em (their community).  We
descend upon their content, red pens in hand, and start *our own* new
community.

The existence of a motivated community is only part of the argument,
however.  The other part of the argument is that the content is, in fact,
worth saving.  If I believed that all of the articles in Wikipedia were in
as bad a shape as Edward Buckner describes the philosophy articles (only 5%
of the philosophy articles being worth preserving at all), then clearly,
Wikipedia wouldn't be worth forking.  But I don't believe that about most
Wikipedia articles.  I think most of them can be saved.

So if we want to be very serious about *revising Wikipedia*, if that is our
near-term goal--which I stipulate that it is--then there is a
straightforward question that I don't see citizendium-l listmembers asking
themselves enough.  To wit, what is the most efficient way *to revise
Wikipedia*?  Clearly, the most efficient way to revise Wikipedia is to fork
it all at once, not piecemeal.

==================================================
An all-at-once fork means CZ will be as bad as WP!
==================================================

Perhaps the most efficient way, you might say, but not the best way.

Kim van der Linde and Ori Redler among others point out that if we fork
Wikipedia all at once, CZ will be just as bad as Wikipedia.  As Ori puts it,
"No one will be able to tell the difference, and no one will have a reason
to prefer CZ."  Well, that's pretty obvious.  If you fork Wikipedia, you'll
*start out* with something that is just as bad as Wikipedia.

I see no good reason to draw the inferences that they want to draw from this
obvious fact, however.  The fact that CZ will have, at the beginning and no
doubt for many months, a whole lot of very bad articles, need not and
probably will not be regarded by the public as a sign of *endorsement* of
poor quality by us the CZ.  After all, if the CZ is understood precisely the
way it is now being billed in the press, and in the way I have described it,
then it will be understood as a "progressive" fork of Wikipedia, which means
that it will steadily *but not instantly* become better than Wikipedia.

We can, moreover, easily mitigate a lot of confusion on this point.  We can
clearly label articles that have not been touched by CZ: "This is a
Wikipedia-sourced article.  It may not be reliable.  But *you* can help to
improve it now!"  The CZ-edited articles can be labelled as such: "This
article started life on Wikipedia but it has been revised by Citizendium
authors and editors.  Join in the fun!"  Furthermore, we can easily
highlight the CZ-improved versions with the appropriate category tags.  So
there is no reason for our work to be "lost in a sea of mediocrity."

Finally, another point you'll note, which I've been quite careful about
specifying, is that the result will not be called *by us* an encyclopedia,
at least not until the community has decided to call it that.

If, by contrast with all this, you conceive of the *short-term* goal of CZ
to be the creation of a near-perfect encyclopedia, you ought to realize that
that goal is not in the slightest compatible with the efficient forking of
Wikipedia.  I think it is more realistic to take the creation of a
near-perfect encyclopedia as a long-term goal, one that will require that we
go through a longish period of, indeed, cleaning out the Augean Stables.

===================================
Perfectionists and wikis do not mix
===================================

The strength that the foregoing argument appears to have will depend, I
think, on the degree to which a person is a perfectionist.  Perfectionists
demand the ideal and they want it *now*; and anything less is failure.

Wikis, by their very nature, present a text that is in flux.  If a wiki is
reasonably open--as CZ's wiki most certainly will be--then it will be open
to decay as well as improvement.  Even with editors able to declare certain
article-specific decisions and rules on the discussion page, authors will
still be able to sign in and sully otherwise good work.

Don't like the sound of that?  Then you don't want a wiki, period.  But you
don't get to help yourself to the remarkable *benefits* of wiki-style open
collaboration, either.  And you certainly shouldn't be involved with the
Citizendium.  Exactly like Wikipedia, the Citizendium won't be able to
*prevent* bad edits from being made.  But CZ *will* be able to fix bad work
more effectively, with less fuss.

I have deliberately not taken a stand on whether we will have "published" or
"stable" versions of articles.  The reason for this is not that I am opposed
to having stable versions, but instead that I have a higher priority: to
build the community.

To me, it is a much higher priority that we build a vibrant expert-led
community, with *thousands* of the most qualified people in the world
working on the project on a daily basis, than that we produce a set of
articles that a much smaller set of people feels comfortable calling
"finished."  Give me the larger, more dynamic community, and, when it puts
its mind to the task, producing "stable" versions of articles will be easy.

"Fine," you say, "but 'the most qualified people in the world' will not feel
motivated to work on the mess that is the Wikipedia corpus.  Like it or not,
they *are* perfectionists, and they *need* to feel that they are working on
stable versions."

If you are inclined to say this, then you have misidentified the future core
of the contributors to the Citizendium.  The "stick-in-the-muds," as one
blogger called them (should have been "sticks-in-the-mud"), will never
really *get* wikis, not until they wake up and discover that the world of
information has changed all around them and they're late to the party.
There's no point in trying to persuade them and I don't propose to try.

I am after the small but growing minority of academics, and other
intellectuals, that are already primed to "get" radical collaboration,
openness, and *gradual* progress toward perfection.  There are, in terms of
sheer numbers, *more* than enough very well-qualified people who are thus
primed to staff CZ.  It's just that Wikipedia is not an inviting venue for
them, where the Citizendium will be.  I specifically want people who are
comfortable with the notion of solid content that has flaws and gradually
gets better, and who are willing to wait for its improvement precisely
because they understand that the collaborative dynamic is enormously
powerful.

========================================
What an all-at-once fork would look like
========================================

Clearly, those advocating piecemeal forking do not have the mental picture I
have, when I think of a thriving all-at-once fork of WP.  So let me paint
the picture for you.

It's opening day for CZ, and the whole world is invited to join--if they can
agree to our ground rules.  Press releases have resulted in major coverage
from many widely-read sources.  (Coverage = authors and editors.)  It's
Slashdot times ten.  In addition, we have two or three people spending hours
and hours posting announcements to academic mailing lists.  The people who
should know about CZ are finding out about it.  On the day of launch, we
have over 1,000 people ready to get to work, and a large portion of them are
professors, graduate students, research scientists, legal scholars,
technical thinkers, and assorted other intellectuals.  The project is
advertised as the content of Wikipedia *placed in the hands of experts*, and
this new and different community is becoming acutely self-aware.

On the first day, people are naturally attracted to their areas of interest.
There are tens of thousands of edits, more than the typical day on
Wikipedia, and many hundreds of articles are edited and created.  In the
first week, many of the most common destination pages are completely
transformed.  (In the first month, many thousands; by the end of the first
year, it's rare to find an article from Wikipedia that hasn't been reworked
beautifully.)

But in the early days, to help keep track of what articles have been
changed, new top-level discipline categories are created.  We begin with a
top-level set of about two dozen disciplines, followed by hundreds of
subdiscipline categories, followed by thousands of specialized topic
categories.  We leave the category scheme there, and the purpose of it is
*only* to help contributors to find articles that have been changed, not to
create the One Best Taxonomy.

These category pages for "CZ-changed articles" become remarkable recruitment
tools in the early months of the project.  For instance, gastroenterologists
can go to the "gastroenterology" page and learn that CZ is producing
enormous numbers of remarkably good articles at unbelievable rates.  For
every specialization, the story is the same: someone posts a link to the CZ
category page to the departmental mailing list, or a professional society's
bulletin board; people see dozens of really good articles that began life on
Wikipedia; and suddenly, the leading edge of entire professions "get it."
Wikipedia *can* be transformed under expert guidance into something that is
actually authoritative.

When that happens, it won't be just CZ that benefits.  The entire world of
publishing and education will change.  Quite possibly overnight.  It will
look for all the world like a revolution.  Just you wait.

===============================================================
Why the dynamic would be less interesting with a piecemeal fork
===============================================================

I think there's an excellent chance that all this will happen, if we fork
all at once.  But if we do a piecemeal fork, it will take much longer to
produce this degree of interest.  Clearly, it will be more *work* and not as
much *fun* to go and grab articles from Wikipedia and then work on them in
CZ.  That hurdle may actually cut out a large portion of participation that
would come to an all-at-once, WP-on-demand fork.  Moreover, if the piecemeal
forkers have their way, it will mean that CZ is devoted to high quality
above a dynamically growing community; so it will be at least a rough
expectation that we are supposed to polish imported articles before grabbing
new ones, and there will be high standards about what articles can be
imported at all.

The result is that after a year, whereas an all-at-once CZ might have 300K
editor-approved articles, a piecemeal-forked CZ might have 30K.  Whatever
the exact numbers, the easy availability of WP articles to edit on CZ will
likely result in an order of magnitude more work being done.  But, you might
say, the 30+K articles in a piecemeal-forked CZ will be more reliable on
average than the 1.4 million articles in the all-at-once-forked CZ.  While
there might be 300K editor-approved articles in CZ after one year, there
would be 1100K unapproved articles.  The approval rate in the piecemeal fork
might be 90%, much better than an approval rate of 25%.

Personally, I'd rather humor the 1100K unapproved articles along with my
300K approved articles, than have only 30K approved articles that make up
90% of the entire CZ corpus.  Besides, in the long run, the *size* of the
expert community involved in the all-at-once scenario would no doubt lead to
a higher-quality encyclopedia.

Here's another way to explain the difference in dynamic: *which game* do you
want to play?  Do you want to play the game of hunting through Wikipedia for
true gems (that are interesting to you), making the necessary edits in CZ to
make sure they're not orphans, uploading them, and then working on them?  Or
do you want simply to click through the entire Wikipedia corpus, spot an
article you want to work on (good or bad), and proceed to make it better
than Wikipedians would ever allow it to be?

Moreover, when you look through CZ, do you want to see only articles that
people *decided* were *good enough* for CZ, and then uploaded and edited in
the thought that those edits would make the articles better than the WP
copy?  Or, instead, would you like also to see WP articles right there in
the context of the CZ-edited articles, in order to make a quick decision
about whether you want to try your hand at changing a WP version right away?

--Larry




More information about the Citizendium-l mailing list