Thursday, January 08, 2009

Dating phylogenetic trees of fossils

A problem I have encountered repeatedly in my PhD is that the way palaeontologists date their phylogenetic trees means lots of branches represent zero million years. In fact, at every single bifurcation one branch is always zero million years in length. This fact is somewhat hidden graphically as published trees drawn against stratigraphy (like this one) usually include some additional default length so that individual branching events can easily be seen. In reality, with branch lengths appropriately scaled, they look more like this:

Phylogeny with branch lengths scaled to time and taxon names removed

So why does this matter? Well the primary reason (as far as I am concerned) is that this screws up the standard calculation of evolutionary rates. (A rate being some change over time; with a denominator of zero the result is infinity.) A secondary problem, then, is that zero-length branches simply aren't realistic.

How to get around this? Well early authors (notably Karl Derstler and Peter Forey) independently went for the simplest option - simply add something to the divisor in each case, 1 million years, 2 million years etc. This may be fine in some cases, but for large phylogenies this divisor can end up pushing branching events really far back in time. (In one example I worked on, lungfish appeared back in the Precambrian.) Not so good.

The best method I am aware of was developed in a paper by my colleague Marcello Ruta and co-authors. They argued the best approach was for zero-length branches to 'share' some time with a preceding branch of positive length. Furthermore, they argued that the proportion of sharing should be linked to the number of character changes on each branch. This essentially assumes an underlying model of equal-rates of character change and hence is biased to what would normally be the null hypothesis.

The Ruta et al. approach was adopted by us in our Science paper, but with a slight modification. As we were using a manual implementation we (by which I mean Steve) used a simpler approach whereby the shared time was split up equally.

Recently I have returned to this problem and have now constructed R code to automate the process. Here is what the above tree looks like using the equal sharing method:

And here is what it looks like using the Ruta et al. method:

For the rate calculations (and the group) you will have to wait for the paper.


Manabu "Mambo-Bob" Sakamoto said...

Why are some branch lengths "0"? Is it simply because they are less than the minimum unit, i.e. 100 million years?

Malacoda said...

No. it is because an internal node is normally dated as being as old as it's oldest descendant. Therefore, on at least one branch, at all splits (bifurcations), the branch will be zero million years in length because the node and descendant are dated as being the same age, e.g. 65 million years - 65 million years = 0 million years.

About Me

My photo
Currently I am founding member, president elect and entire membership of SWEMP (the Society of Wonky-Eyed Macroevolutionary Palaeobiologists). In my spare time I get paid to do research on very dead organisms and think about the really big questions in life, such as: What is the ultimate nature of reality? Why is there no room for free will in science? and What are the implications of having a wardrobe that consists entirely of hotpants?