*** DRAFT, TO-BE-MAYBE-REMOVED *** I'd like for deb2bzr to have the following property: - deb2bzr will produce identical branches when run over the very same set of .dsc files (I'll leave out the paragraph giving a rationale for this wish, because even if it's of no use, I'd still want to implement it, in order to learn a bit about these concepts, because one thing it's somewhere in my TODO file is implementing that same property in the tailor backend for bzr.) To implement that property, one has to care about deterministic file-ids and revision-ids. My initial thoughts on the matter were to create unique but deterministic revision-ids from four pieces of information: 1. e-mail address in the changelog 2. timestamp in the changelog 3. a hash of ''.join([md5sum(f) for f in sorted(files_in_dsc)]) 4. a hash of the parent revision-id (1) and (2) are there only for aesthetics and having my rev-ids somewhat similar to the ones bzr creates; (3) exists as a hash of the contents, and it's the true source of uniqueness; and (4) to prevent runs with e.g. missing history bits (IOW, missing some versions of the package) get the same revision-id. For file-ids, I had a vague idea of using 2-4. * * * After a quite productive discussion on 2006-08-05 with John Arbash Meinel in #bzr, some things changed. We can summarize the situation described above with: 19:02 I want same file-ids/rev-ids _for runs with the exact same set of information_ However, a source packages is not a delta against the previous version, but a full snapshot, so if you have two deb2bzr runs, each of them over: (a) foo_1.2-1, foo_1.2-2, foo_1.2-3, foo_2.1-1, foo_2.1-2 (b) foo_1.2-1, foo_1.2-2, foo_2.1-2 With the scheme above, the branches would not share file-ids nor revision-ids after the second revision, yet they would have the very same contents at the end. With this, I asked John of whether it'd be possible to improve convergence between branches with some of them possibly having some bits of history missing. (Gentle reminder: please don't think about the real world use of this, and think of this more like a learning exercise.) After several minutes of discussion, I learnt that, since deb2bzr wants no support for renames, something indeed could be done, involving ghost revisions. And in this new scheme, file-ids should not encode info about when they were introduced, etc., but a simple escape-path mapping would be enough instead. Later that day, Robert Collins pointed out: 02:36 if you use the escaped path you will appear to have the same file as other packages 02:36 this will lead to rather pathological behaviour 02:36 hm, gotcha * * * 2006-08-06