Sunday, 29 June 2014

Random bases and Infinite Monkey Theorem

An email conversation from work....

A:  We needed a big file,  do you have some test data?

B: cat /dev/urandom > random_data.txt

A: You know about Infinite Monkey Theorem, right? A monkey typing forever will eventually produce the works of William Shakespeare? I think you've just automated that monkey.

But I was thinking of something with somewhat smaller entropy... The tragedy would be if we couldn’t notice a difference...

B:  OK what about:

perl -e "while(1) { print(('G','A','T','C')[rand 4]); }" > random_bases.fasta

Or to be mean:


perl -e "while(1) { print(('G','A','T','C')[rand 4] x rand 10); }" > ion_torrent.fasta
 
(Ed: for those who don't know Perl this looks like)
CCCCTTTTTTTTTTTTTGGGTTGGGGGGGGTTTTTTTTTT
TTTTCCCCCCCCTTTTTTTAAAAGGGGGGGAAGGGGGTTT
TTTTAAAAAAAGGGAAAAATTTTTTTTTTTTTTTTTTTTT
TTTTTTTTTTGGAAAAAAGGGGGGAAAAAAAAATTTTTTT
CCGGCCCCTTTTTCCCCCCCCCGGGGGGGGGTTTTTTAAA

A: Actually it looks like we need a corollary, Infinite Monkey Genome Theorem: if you removed all the keys from the monkey's keyboard except  'G', 'A', 'T' and 'C', they will eventually type out the complete genome of William Shakespeare.

No comments:

Post a Comment