Thanks a lot for the suggestions!

It makes sense that the filter order and the channel coherence time /
Doppler spread (or rather, its inverse) would affect the block size
selection. Still, in cases where coding is involved, with intentional
redundancy/correlation, perhaps some added conservatism, i.e. larger
block sizes, is warranted.

And on a somewhat related note, pp. 164-169 of Steven Kay's "Estimation
Theory" investigate the effect of finite data record on maximum
likelihood estimation. The conclusions seem to be that the precise data
record length required for guaranteed asympotic optimality is not
generally known in all cases. Fortunately, however, it is found in
practice (through simulations) that the data length does not have to be
excessive for good estimates. Still, the given simulation examples deal
primarily with quantities on the order of 1e-1.

Anyway, unless analytical lower bounds can be established, I still feel
suspicious of extremely low BERs obtained with small block sizes. After
all, with BER as low as say 1e-6, it is quite unlikely that block with
small sizes would result in any bit error. And each new block is like
"resetting" the simulation state. Thus, the many zero-error blocks
would tend to "skew" the average BER, towards something even lower than
the true BER value?

The block-serial approach hinges on block independence. So the answer
to your question, "what is the minimum block-size" is given simply as
the size which will achieve good approximation to block independence.

There are two factors determining the correlation between blocks. The
first is the dispersive nature of the communications channel, which
will spread information between blocks. So the block size should at
least be a few orders of magnitude larger than the channel coherence
time (impulse response duration).

Secondly the channel noise may be coloured and not white. A coloured
noise generator can be modelled as the output of an appropriate ARMA
(auto-regressive moving average) filter driven by a white input. The
level of noise auto-correlation over time will be given by the impulse
duration of the ARMA filter. Again, the block size should be orders of
magnitude larger than this.

If you have a good model for your channel and noise envornment, then it
should be a straightforward process to pick your block size. As an
example, suppose you are expecting a bit error rate 10^-3. You would
measure 10 errors by simulating approximately 10,000 samples. Now your
question was, should that be 10*1000, 20*500, 50*200 etc, in the block
serial approach. If your channel has impulse response duration of about
100 samples (say a DSL channel) then your block size should be large
(10*1000) whereas if the channel is a multipath wireless channel with
arrivals over 10 samples, you can use a much shorter block.

Porterboy

The relatively recent book "Principles of Communication Systems
Simulation with Wirless Applications" by Tranter & Shanmugan &
Rappaport & Kosbar talks about a Monte Carlo simulation technique
called "block-serial": page 380, and also page 747 code listing, with
the "block size" NIterate = 1e3.

This technique is, of course, not new. To summarize, to
simulate/estimate the bit-error rate (BER),  data blocks (e.g. of 1000
symbols) are processed iteratively until some N total symbols are
reached. Then:
 BER = (total number of errors) / (total number of bits)

Hence, it can also be viewed as averaging over many realizations (each
block = 1 realization): block 1 has associated BER_1, block 2 has
BER_2, etc. If there are K blocks processed then,
 BER_avg = (BER_1 + BER_2 + ... + BER_K) / K

Note that this technique is actually "required" in some cases, (i.e.,
cannot run a single realization with N total symbols), notably when
fading is considered. In such cases, various realizations must be
simulated to account for "random" channels: each realization = a
different channel condition.

Now, it is quite well-known what the total number of symbols N (or the
equivalent number of bits) required for a certain BER resolution (e.g.
BER=1e-6) should be. However, my question is, does the same N apply
when we're segmenting data symbols into "blocks" as described?

Most notably, if the data block size is 1000 (and asumming binary
symbols), then the BER resolution for each block is 1e-3. So what is
the "reliable" BER resolution that one can expect when averaging over
many blocks? And what reasons/theorems can be used to justify such
claims?

Of course, in the ideal case (independent trials, ergodicity
assumptions, etc.), there should not be much difference between
block-serial and single-block simulations. However, can the same be
said for "typical" simulations, e.g. with channel fading, coding
techniques, all introducing some amount of correlation? Is it evident
that in this case, averaging over (possibly dependent) data blocks
would possibly result in biased and/or inconsistent BER estimates?

The literature seems to agree on the fact that, to get a reliable BER
estimate, some mininum number of raw independent bit errors (e.g. 10
or 100) should be obtained in the simulation. Can this criteria still
be reliable in the block-serial approach? To exaggerate, let's suppose
we use short blocks of say 20 data symbols, and then repeat as many
times as necessary until there are 50 bit errors reported. And suppose
that the BER is reported as 1e-5. Will the BER obtained in this case
still be "reliable"? How would this compare to a different simulation
using blocks of say 5000 data symbols?

So I guess the MAIN QUESTION is the following:

 "WHAT SHOULD THE MINIMUM BLOCK-SIZE BE, TO RELIABLY OBTAIN 
  A CERTAIN BER-RESOLUTION?"

For example, to reliably estimate BER=1e-4, should data blocks have at
least 1e4 bits?

Note that in the cited book by Tranter et al, there are detailed
discussions on picking the total number of symbols N to satisfy a
certain BER resolution; however, much less is said on how to choose
the individual block size (1000 symbols/block assumed in the given
examples).

And last but not least, if the block-size is an important factor,
would this mean a practical problem in BER estimation when considering
short-burst systems? e.g. GSM, where the block-size is by design
limited to short blocks.

Whew, I did not intend to make this a long post, but giving more
information is perhaps always better than too few :-)

Also, if this topic has been discussed before, I'd appreciate URL
pointers, or article citations.

Thanks all for your attention and time :-)

--
MT