Genomic Structure and Evolution

The local density of Single Nucleotide Polymorphisms (SNPs) within the human genome, as well as that of genes, appears to cluster in accord with the variance-to-mean power law and the Tweedie compound Poisson–gamma distribution. In the case of SNPs their observed density reflects the assessment techniques, the availability of genomic sequences for analysis, and the nucleotide heterozygosity. The first two factors reflect ascertainment errors inherent to the collection methods, the latter factor reflects an intrinsic property of the genome.

In the coalescent model of population genetics each genetic locus has its own unique history. Within the evolution of a population from some species some genetic loci could presumably be traced back to a relatively recent common ancestor whereas other loci might have more ancient genealogies. More ancient genomic segments would have had more time to accumulate SNPs and to experience recombination. R R Hudson has proposed a model where recombination could cause variation in the time to most common recent ancestor for different genomic segments. A high recombination rate could cause a chromosome to contain a large number of small segments with less correlated genealogies.

Assuming a constant background rate of mutation the number of SNPs per genomic segment would accumulate proportionately to the time to the most recent common ancestor. Current population genetic theory would indicate that these times would be gamma distributed, on average. The Tweedie compound Poisson–gamma distribution would suggest a model whereby the SNP map would consist of multiple small genomic segments with the mean number of SNPs per segment would be gamma distributed as per Hudson’s model.

The distribution of genes within the human genome also demonstrated a variance-to-mean power law, when the method of expanding bins was used to determine the corresponding variances and means. Similarly the number of genes per enumerative bin was found to obey a Tweedie compound Poisson–gamma distribution. This probability distribution was deemed compatible with two different biological models: the microarrangement model where the number of genes per unit genomic length was determined by the sum of a random number of smaller genomic segments derived by random breakage and reconstruction of protochormosomes. These smaller segments would be assumed to carry on average a gamma distributed number of genes.

In the alternative gene cluster model, genes would be distributed randomly within the protochromosomes. Over large evolutionary timescales there would occur tandem duplication, mutations, insertions, deletions and rearrangements that could affect the genes through a stochastic birth, death and immigration process to yield the Tweedie compound Poisson–gamma distribution.

Both these mechanisms would implicate neutral evolutionary processes that would result in regional clustering of genes.

