[Dump] Some thoughts on experiements (was: WiFi experiments during HAR2009)

Sat May 30 01:26:27 CEST 2009

Some thoughts on how to compare routing protocols in practical tests.

A few weeks ago after coming back from WBM Aaron asked me how I'd
compare routing protocols. After thinking about it for a while now,
these are the ideas I've come up with:

Obviously people have different goals when designing a routing
protocol - otherwise there wouldn't be that many to choose from.
Experience shows that these special features, protocol designers
are interested in, turn out to be largely irrelevant in practice.
I (somebody who's only up-link to the internet depends on the
vienna funkfeuer mesh) don't care much if an implementation
meets some special design goal (as is tested on the interop
conferences for olsr) but mostly if it "gets the ball across
the net" at all. So basically that's the point of view of the
discussion below.

1) What to test
Often when discussing practical tests people claim: "We need to test
how the entire system plays together - the protocol, the network stack,
adhoc mode, ...". That would be a tedious task, adhoc mode wifi drivers
are generally buggy (partially on purpose), there are many 
incompatibilities most of them rather subtle that show up only every
now and then and make interpreting any results a challenge per se.

We are far from getting the individual components right, so I'm not that
much interested in results about the whole system. Thus let's focus
our tests on the routing protocol (implementations) per se.

2) How to make results for different protocols comparable
At WBM (wireless battle mesh) each test node consisted of three
routers each running one of the protocols on its own wifi channel.
I see several issues with this setup:
 * A positional difference of only a few centimeters is enough to
   (potentially) change the radio characteristics of the setup.
   The wavelength at 2400MHz is about 13 cm, thus a difference of
   6 cm can turn constructive interference into destructive interference.
 * Different channels might be subject to very different external noise
   conditions.
 * Using identical hardware on all three routers of a node doesn't
   guarantee that the same wifi issues (stuck beacons, etc.) happen
   at the same time on all routers.
 * This kind of setup wastes a lot of hardware that could be used to
   build a much bigger mesh.

Instead I propose to test only one protocol after an other but repeat
this several times to get enough data to do some statistical analysis
for easier interpretation of the results.

To do this efficiently some setup would need to be developed that
allows to switch the routing protocol of an entire mesh easily.

3) "getting the ball across the net" tests
There are many requirements on good routing, but the details vary
depending on the application. Do you want good latency? High through-put?
That's the business of metrics so I will ignore that here.  From the
routing protocol POV there are two things, that I'm interested in:

a) How much airtime does the routing overhead cost?
Well, determining the actual airtime would be difficult, but at least
getting the total amount of routing traffic is fairly easy: On linux
just use the utility "tc" to set up a special queue for routing traffic.
At the end of each test run, collect the statistics of this queue.
The maximum traffic in the case of an mobility event would be interesting
too, so perhaps even try to collect statistics at short intervals...

b) Is the routing correct/reliable?
 * How many destinations have reachable routes? - This can be
   determined by some script on the nodes themselves just periodically
   polling the OS routing table. No test traffic is needed.
 * How many nodes are actually reachable (ie not blocked by loops)?
   This could be implemented as a simple ping test from some nodes
   to all others. However it might be preferable to use udp test
   traffic and some piece of software to detect/count the reception
   of test data. 
   Whichever test is used, care should be taken not to overload the
   network and not to loose test packets due to interference as much
   as possible. Basically that means:
   - use of small packets
   - send out packets on a low rate only
   - set the TTL of the packets to just above the number of nodes in the
     network to avoid polluting the spectrum in case of loops to much

4) setting up the test environment
The performance of any routing protocol will vary a lot depending on
the topology of the test setup. Perhaps we could have a challenge like:
Design a topology where you believe that your pet routing protocol
will beat your competitors.

When designing test setups we should keep in mind that any serious
community mesh uses a mixture of wired and wireless links on many
different wifi channels. Some nodes might be used to generate
radio noise instead of participating in the mesh or we can mess
with the transmit power. Both methods cause mobility events that
are easily scriptable to make them as reproducible as possible.

Harald