Day 1 (4 hours)
12:15-4:30 pm
with lunch and coffee breaks
intro (20 minutes)
bus factor
introductions
what's your git-annex skill level?
your Haskell skill level?
Haskell for Readers
http://joeyh.name/talks/git-annex-developer-workshop/haskell-for-readers
schedule overview
building git-annex from source (10 minutes)
git clone git://git-annex.branchable.com/ git-annex
sudo apt-get install haskell-stack / http://haskellstack.org/
stack build
(older OS: stack build --stack-yaml stack-lts-9.9.yaml)
git-annex core concepts and types (60 minutes)
Key
Types/Key.hs (aka Types.Key)
key/value storage
not encryption key
annex symlinks and links
show examples
UUID
Types.UUID
unique identifier for git repository / special remote
does a normal git repository have an identifier?
no
any clone is much like any other
git repositories may contain unique data
but then no other repo knows about it
why does git-annex need a repository identifier?
each clone may contain a different set of contents of files
needs to know where the contents of a file is located
NoUUID
ugly and hides problems
thought exercise
imagine getting rid of NoUUID constructor
use Maybe UUID instead
eliminate the Maybe where possible
Remote
Types.Remote
git remote
special remote
important fields of the Remote data type
uuid
cost
storeKey
retrieveKeyFile
removeKey
checkPresent
compare Remote with external special remote protocol
http://git-annex.branchable.com/design/external_special_remote_protocol/
interlude: how we use git-annex (30-60 minutes)
me
I built it for my own personal use
glimpse inside some of my git-annex repos and workflows
how datalad uses git-annex
over to yarik and michael
how others here use git-annex
git-annex core concepts continued (30 minutes)
recap
Key
UUID
Remote
git-annex branch
http://git-annex.branchable.com/internals/
union merging
CRDTs & vector clocks
Annex.VectorClock
example
location tracking
Logs.Presence.Pure
LogLine
LogStatus
logParser
buildLog
exercise
design a new git-annex branch file format
how does this interact with union merging?
how are old values removed from the file?
space efficiency
repeated uuids, timestamps, etc
git gc
what core git-annex concepts *don't* have types?
git-annex branch files
git-annex object files
would adding types for these improve the code?
case study: adding a new, major feature to git-annex (60 minutes)
git-annex import tree
http://git-annex.branchable.com/design/importing_trees_from_special_remotes/
high level design
dual of export tree
after importing, exporting the same tree is a no-op
after exporting, importing yields the same tree
lists content of special remote
downloads new content from special remote
(necessary to generate keys?)
builds a git tree of its contents
potential for export tree conflict
important sticking point in design
mitigations
race safety via good ContentIdentifier
S3 versioning
UI
analogy to git fetch / git push
import tree and git fetch both update remote tracking branch
refs/remotes/$name/master
git push and export tree also update remote tracking branch
to reflect their changes to the remote
import/export special remote becomes similar to a git working tree
without .git
but with files that may be modified there and later imported
api design
ImportActions
data types
ImportLocation
ContentIdentifier
storage
git-annex branch
Logs.ContentIdentifier
mappings between ContentIdentifier and Key
local sqlite database
Database.ContentIdentifier
ImportableContents
RemoteTrackingBranch
ImportTreeConfig
ImportCommitConfig
option parsing
Parser ImportOptions
added RemoteImportOptions
optparse-applicative
planning for tomorrow (10 minutes)
start thinking about a feature you'd like to see in git-annex
or a part of git-annex implementation you want to explore
to discuss tomorrow morning
Day 2 (4 hours)
9 am-1pm with coffee and lunch breaks
git-annex implementation details (60 minutes)
not as core as Key, UUID, Remote, but all over the code
Git
Repo
Ref
Branch
Sha
Tag
exercise
Ref, Branch, Sha, Tag are all aliases
not type safe!
no distinction at all between Commit, Tree, Object
split into separate types for type safety
perhaps Ref Tag, Ref Commit, Ref Tree, Ref Object
and Sha Tag, Sha Commit, Sha Tree, Sha Object
why is Git interface in git-annex at all?
GitConfig
exercise
add a new git config value to it
Annex
monad
"global" state
gitRepo
getGitConfig
remoteList
Types.Command
discussion: designing new git-annex features (120 minutes)
discuss participants' feature ideas and think about designs