with all Debian packages produced in the past 5 years
6.5 terabytes

first Wikipedia database dump in 4.5 years
5.6 terabytes (32 gigabytes compressed)

My reflex on seeing both of these was to think about putting them into git repositories.

For, injecting source packages into git repositories is easy with git-import-dsc (and it can also use pristine-tar to make the original tarballs accessible with a miminal overhead). I hope the admins find time & space to do that, because being able to easily access git annotate data spanning 5 years for any package in the archive would be very useful.

To produce a usable git repository from the Wikipedia dump would probably involve writing a custom git-fast-import that processed the huge xml dump, and chunked it up into individual files, and commits changing those files. Frankly, I do prefer wikis that store data in git in that format natively and don't have multi-month dump procedures. ;)

How big would the git repos be? My SWAG is well under 1 terabyte for, and between 30 and 300 gigabytes for the wikipedia data.