I've finished importing the usenet archive for oldusenet. The fun part was parsing the dates to put the posts in order.

No date format was really required on usenet, and so a wide variery of formats were used. Some posts didn't have a Date, but a guess could be made from their Message-ID. Some posts had absurd dates (ie, 1969, 1995), others had dates that were correct in every way.. except the year was left out (oops). One early post had a date of "_____".

Still, this excerpt of my code managed to parse the rest and so gives a fairly complete picture of how messy dates can possibly be. Read and weep.

  p anyzone "%d %b %y %T"       "15 Jun 88 02:27:41 GMT"
, p anyzone "%a, %d %b %y %T"       "Thu, 22 Jun 89 20:02:03 GMT"
, p anyzone "%a, %d-%b-%y %T"       "Thu, 15-Jun-89 18:01:56 EDT"
, p anyzone "%d %b %y %T"       "8 Jan 90 14:07:27 -0400"
, p anyzone "%d %b %y %H:%M"        "4 Oct 89 19:56 GMT"
, p anyzone "%a, %d %b %y %H:%M"    "Thu, 23 May 91 02:13 PDT"
, p anyzone "%a, %d %b %Y %T"       "Thu, 23 May 1991 07:07:00 -0400"
, p anyzone "%a, %d %b %Y %H:%M"    "Sat, 18 May 1991 17:28 CDT"
, p anyzone "%d %b %Y %T"       "11 Apr 1991 12:02:01 GMT"
, p anyzone "%d-%b-%y %H:%M"        "24-Mar-90 14:22 CST"
, p anyzone "%d %b %y, %T"      "22 May 91, 16:31:37 EST"
, p anyzone "%d %b %Y %H:%M"        "30 June 1991 17:15 -0400"
, p anyzone "%a, %d %b T  %T"       "Fri, 8 Feb T  09:49:39 EST"

-- special cases
, p (tzconst est) "%a %b %d %T EST %Y"  "Tue Jan 11 12:44:36 EST 1983"
, p (tzconst est) "%a %b %d %T EST %y"  "Tue Jan 11 12:44:36 EST 83"
, p (tzconst edt) "%a %b %d %T EDT %Y"  "Tue Jan 11 12:44:36 EDT 1983"
, p (tzconst edt) "%a %b %d %T EDT %y"  "Tue Jan 11 12:44:36 EDT 83"
, p (tzconst utc) "%a %b %d %T GMT %Y"  "Thu Nov  1 23:14:37 GMT 1990"
, p (tzconst pdt) "%d %b %y %T -7"  "11 Jun 91 15:41:21 -7"

-- dates with no timezone specified are guessed
, p nozone "%d %b %y %T"        "9 Jan 90 09:33:59"
, p nozone "%d %b %Y %T"        "10 APR 1990 05:25:28"
, p nozone "%a %b %d %T %Y"     "Fri Feb  6 00:19:47 1981"
, p nozone "%a %b %d %T %y"     "Fri Feb  6 00:19:47 81"
, p nozone "%Y-%m-%d %T"        "1981-11-12 18:31:01"
, p nozone "%y-%m-%d %T"        "81-11-12 18:31:01"
, p nozone "%a, %d %b %y %T"        "Sat, 13 Apr 91 08:37:57"
, p nozone "%a, %d %b %Y %T"        "Sun, 16 Jun 1991 13:23:02"
, p nozone "%d %b, %Y %T"       "1 May, 1991 00:00:00"
, p nozone "%d %b %y %H:%M"     "8 Jan 88 18:03"
, p nozone "%a, %d %b %y %H:%M"     "Wed, 29 May 91 17:14"
, p nozone "1 %b %d %T %Y"      "1 Jan 08 20:59:08 1991"

-- this has to come near the end, as it matches greedily
, g nozone "%a %b %d %T %Y ("       "Wed Oct 27 17:02:46 1982 (Tuesday)"
, g nozone "%a, %d %b %y %T +"      "Tue, 21 May 91 16:46:01 +22323328"

-- extract date from message-id headers
-- (used for messages with no Date field)
, g nozone "<%Y%b%d.%H%M%S."        "<1989Jul6.214048.28313@jarvis.csri.toronto.edu>"

(Parsing the often ambiguous, malformed, etc timezones was fun all its own too, of course.)

those were different times

... but I can give a clue about the bad dates. A date of 1969 sometimes resulted from software that translated a missing date; the 0 would turn into Jan 1, 1970 and anything west of Greenwich (e.g. anywhere in the Americas) would make this a few hours earlier, so Dec. 31, 1969. The 1995 may have been a hack to beat the "Expires:" mechanism; early versions of the news software would leave dates in the future around forever.

My first contribution to a widely used free software package was a port of 2.11 B news to an obscure Unix variant. I remember being really paranoid so I made sure that if my platform weren't selected not one byte of preprocessor output would change, so no one could blame me for breaking the world.

Comment by Joe
comment 3
The 1995 post was a rather well done and scarily prescient joke about, essentially, mp3s and the Apple store, posted in 1981.
Comment by joey