I've finished importing the usenet archive for oldusenet. The fun part was parsing the dates to put the posts in order.
No date format was really required on usenet, and so a wide variery of formats were used. Some posts didn't have a Date, but a guess could be made from their Message-ID. Some posts had absurd dates (ie, 1969, 1995), others had dates that were correct in every way.. except the year was left out (oops). One early post had a date of "_____".
Still, this excerpt of my code managed to parse the rest and so gives a fairly complete picture of how messy dates can possibly be. Read and weep.
p anyzone "%d %b %y %T" "15 Jun 88 02:27:41 GMT"
, p anyzone "%a, %d %b %y %T" "Thu, 22 Jun 89 20:02:03 GMT"
, p anyzone "%a, %d-%b-%y %T" "Thu, 15-Jun-89 18:01:56 EDT"
, p anyzone "%d %b %y %T" "8 Jan 90 14:07:27 -0400"
, p anyzone "%d %b %y %H:%M" "4 Oct 89 19:56 GMT"
, p anyzone "%a, %d %b %y %H:%M" "Thu, 23 May 91 02:13 PDT"
, p anyzone "%a, %d %b %Y %T" "Thu, 23 May 1991 07:07:00 -0400"
, p anyzone "%a, %d %b %Y %H:%M" "Sat, 18 May 1991 17:28 CDT"
, p anyzone "%d %b %Y %T" "11 Apr 1991 12:02:01 GMT"
, p anyzone "%d-%b-%y %H:%M" "24-Mar-90 14:22 CST"
, p anyzone "%d %b %y, %T" "22 May 91, 16:31:37 EST"
, p anyzone "%d %b %Y %H:%M" "30 June 1991 17:15 -0400"
, p anyzone "%a, %d %b T %T" "Fri, 8 Feb T 09:49:39 EST"
-- special cases
, p (tzconst est) "%a %b %d %T EST %Y" "Tue Jan 11 12:44:36 EST 1983"
, p (tzconst est) "%a %b %d %T EST %y" "Tue Jan 11 12:44:36 EST 83"
, p (tzconst edt) "%a %b %d %T EDT %Y" "Tue Jan 11 12:44:36 EDT 1983"
, p (tzconst edt) "%a %b %d %T EDT %y" "Tue Jan 11 12:44:36 EDT 83"
, p (tzconst utc) "%a %b %d %T GMT %Y" "Thu Nov 1 23:14:37 GMT 1990"
, p (tzconst pdt) "%d %b %y %T -7" "11 Jun 91 15:41:21 -7"
-- dates with no timezone specified are guessed
, p nozone "%d %b %y %T" "9 Jan 90 09:33:59"
, p nozone "%d %b %Y %T" "10 APR 1990 05:25:28"
, p nozone "%a %b %d %T %Y" "Fri Feb 6 00:19:47 1981"
, p nozone "%a %b %d %T %y" "Fri Feb 6 00:19:47 81"
, p nozone "%Y-%m-%d %T" "1981-11-12 18:31:01"
, p nozone "%y-%m-%d %T" "81-11-12 18:31:01"
, p nozone "%a, %d %b %y %T" "Sat, 13 Apr 91 08:37:57"
, p nozone "%a, %d %b %Y %T" "Sun, 16 Jun 1991 13:23:02"
, p nozone "%d %b, %Y %T" "1 May, 1991 00:00:00"
, p nozone "%d %b %y %H:%M" "8 Jan 88 18:03"
, p nozone "%a, %d %b %y %H:%M" "Wed, 29 May 91 17:14"
, p nozone "1 %b %d %T %Y" "1 Jan 08 20:59:08 1991"
-- this has to come near the end, as it matches greedily
, g nozone "%a %b %d %T %Y (" "Wed Oct 27 17:02:46 1982 (Tuesday)"
, g nozone "%a, %d %b %y %T +" "Tue, 21 May 91 16:46:01 +22323328"
-- extract date from message-id headers
-- (used for messages with no Date field)
, g nozone "<%Y%b%d.%H%M%S." "<1989Jul6.214048.28313@jarvis.csri.toronto.edu>"
(Parsing the often ambiguous, malformed, etc timezones was fun all its own too, of course.)
... but I can give a clue about the bad dates. A date of 1969 sometimes resulted from software that translated a missing date; the 0 would turn into Jan 1, 1970 and anything west of Greenwich (e.g. anywhere in the Americas) would make this a few hours earlier, so Dec. 31, 1969. The 1995 may have been a hack to beat the "Expires:" mechanism; early versions of the news software would leave dates in the future around forever.
My first contribution to a widely used free software package was a port of 2.11 B news to an obscure Unix variant. I remember being really paranoid so I made sure that if my platform weren't selected not one byte of preprocessor output would change, so no one could blame me for breaking the world.