very rare events: how low can you go?

Howard Shapiro hms at
Tue Jan 30 20:56:50 EST 2001

Jim Houston wrote-

>Those of us doing CD34 enumeration are routinely asked to give results
>sometimes as low as .1%.  In fact normal adults have a measurable percentage
>as low as .02%.  When I first started CD34 enumeration 5 years ago I was
>told you had to have at least 50 of these positive events to be significant.
>Needless to say we acquire alot of data to get these numbers.
>20 years ago I believed the error  by flow to be around 2-5%.  If you run
>the same tube 10 times you will get some variance in the percentage you are
>looking for, particularly in the levels of .1%.  The problem occurring is
>the significance of an increase from a measured .2% to a .4% level.  Is this
>increase real or is it a factor of the instrument.  When this number is used
>to calculate Absolute numbers in a leukopheresis product then it can be
>Now I have been requested to give numbers lower than .01%.  How to make this
>accurate?  Good question.  I assume that most populations have some inherent
>properties to them.  They scatter light in discrete patterns and have some
>sort of phenotype unique to them.  The more parameters I use the better the
>confidence level.  If I set all analysis markers by isotype controls then
>that could make my decision easier, but these give erroneous results if you
>look at the data carefully.  If you collect 500,000 cells and then see a
>population of 16 are these real????
>In practice the low %'s <1% are at best sometimes subjective to the
>operator's experience.
>A good question is not only the ability by flow to give these low %'s but
>how reproducible is it.  Some labs are running their samples in doublets or
>triplets then averaging.  This will drive the cost up a bit.
>Are there any statisticians out there who can answer some of these

As I pointed out in an earlier posting on rare event analysis, it is
possible, under the right circumstances, to detect one cell in 10 million.

The question Jim has asked, though, refers to the accuracy and precision of
estimating counts of rare events, in this case, the number and/or
percentage of CD34+ cells.

When anything is being counted, Poisson statistics come into play; if you
count n of anything, the standard deviation will be the square root of
n.  The coefficient of variation (CV), in per cent, will be 100 divided by
the square root of n.  This assumes that the counting process itself is
perfect; the point is that if you count 25 objects, you'll get a standard
deviation of five objects, and a CV of 100/5, or 20 per cent; if you count
100 objects, the standard deviation will be 10, and the CV will be 10 per
cent, and it is easy to see that getting precision to 1 per cent requires
that you count 10,000 objects.  If these objects are rare cell types
present at a frequency of one per million cells in the sample, you have to
analyze 10 billion cells to find the 10,000 cells you're interested in.

The same statistics apply to counting everything from photons and
photoelectrons (peaks from dimly fluorescent cells have bigger CV's than
peaks from brightly fluorescent cells because fewer photons are collected
from the dim cells) to votes (if 3,000,000 votes are counted, one expects a
standard deviation of 1,732 votes, or roughly 6 parts in 10,000, meaning
that if the process is supposedly only 99.9% reliable, or accurate (10
parts in 10,000), as it was widely stated to be, neither Bush nor Gore has
a strong claim to having won the Florida Presidential vote).

We have a little more control over cell counting than over vote
counting.  If you count enough cells, you can accurately discriminate
between, say, .01% and .02%.  If you only count 10,000 cells total, you'd
expect to find one cell (and a CV of 100%) in the sample with .01% and 2
cells (CV of 70.7%) in the sample with .02%; so 10,000 cells total is too
small a sample to let you discriminate.  If you count 1,000,000 cells
total, you end up with 100 cells in the .01% sample (CV 10%) and 200 cells
(7.1% CV) in the .02% sample, and this difference will be statistically

The best way to do counts - although almost nobody does them this way - is
to always count the same number of cells of interest, which gives you equal
precision no matter what the value is.  Normally, we do absolute counts by
analyzing a fixed volume of blood (or other sample) and percentage counts
by analyzing a fixed number of cells.  The alternative is to decide on the
level of precision you want - suppose it is 5%.  Then you have to count 400
cells (the square root of 400 is 20, and 100/20 = 5).  What you do is
measure the volume of sample (in the case of absolute counts), or the total
number of cells (in the case of percentage counts), which has to be
analyzed to yield 400 of the cells of interest.  If the cells of interest
are at .01%,, you'll have to count 4,000,000 cells total to find your 400
cells of interest; if they are at 1%, you'll only have to count 40,000
cells, but, instead of the .01% value being much less precise than the 1%
value, both will have the same 5% precision.  The down side of this is that
it requires some reprogramming of the apparatus, and possibly uses more
reagent, but, if you want good numbers, there is simply no alternative.


More information about the Cytometry mailing list