Debian packaging consists of a directory (debian/) containing a number of "hard-coded" filenames such as debian/control, debian/changelog, debian/copyright. In addition to these, many packages will also use a number of optional files that are named via a pattern such as debian/{{PACKAGE}}.install.
At a high level the patterns looks deceptively simple. However, if you start working on trying to automatically classify files in debian/ (which could be helpful to tell the user they have a typo in the filename), you will quickly realize these patterns are not machine friendly at all for this purpose.
The patterns deconstructed
To appreciate the problem fully, here is a primer on the pattern and all its issues. If you are already well-versed in these, you might want to skip the section.
The most common patterns are debian/package.stem or debian/stem and usually the go to example is the install stem ( a concrete example being debian/debhelper.install). However, the full pattern consists of 4 parts where 3 of them are optional.
- The package name followed by a period. Optional, but must be the first if present.
- The name segment followed by a period. Optional, but must appear between the package name (if present) and the stem. If the package name is not present, then the name segment must be first.
- The stem. Mandatory.
- An architecture restriction prefixed by a period. Optional, must appear after the stem if present.
To visualize it with [foo] to mark optional parts, it looks like debian/[PACKAGE.][NAME.]STEM[.ARCH]
Detecting whether a given file is in fact a packaging file now boils down to reverse engineering its name against this pattern. Again, so far, it might still look manageable. One major complication is that every part (except ARCH) can contain periods. So a trivial "split by period" is not going to cut it. As an example:
debian/g++-3.0.user.service
This example is deliberately crafted to be ambiguous and show this problem in its full glory. This file name can be in multiple ways:
- Is the stem service or user.service? (both are known stems from dh_installsystemd and dh_installsystemduser respectively). In fact, it can be both at the same time with "clever" usage of --name=user passed to dh_installsystemd.
- The g++-3.0 can be a package prefix or part of the name segment. Even if there is a g++-3.0 package in debian/control, then debhelper (until compat 15) will still happily match this file for the main package if you pass --name=g++-3.0 to the helper. Side bar: Woe is you if there is a g++-3 and a g++-3.0 package in debian/control, then we have multiple options for the package prefix! Though, I do not think that happens in practice.
Therefore, there are a lot of possible ways to split this filename that all matches the pattern but with vastly different meaning and consequences.
Making detection practical
To make this detection practical, lets look at the first problems that we need to solve.
- We need the possible stems up front to have a chance at all. When multiple stems are an option, go for the longest match (that is, the one with most periods) since --name is rare and "code golfing" is even rarer.
- We can make the package prefix mandatory for files with the name segment. This way, the moment there is something before the stem, we know the package prefix will be part of it and can cut it. It does not solve the ambiguity if one package name is a prefix of another package name (from the same source), but it still a lot better. This made its way into debhelper compat 15 and now it is "just" a slow long way to a better future.
A simple solution to the first problem could be to have a static list of known stems. That will get you started but the debhelper eco-system strive on decentralization, so this feels like a mismatch.
There is also a second problem with the static list. Namely, a given stem is only "valid" if the command in question is actually in use. Which means you now need to dumpster dive into the mess that is Turning-complete debhelper configuration file known as debian/rules to fully solve that. Thanks to the Turning-completeness, we will never get a perfect solution for a static analysis.
Instead, it is time to back out and instead apply some simplifications. Here is a sample flow:
- Check whether the dh sequencer is used. If so, use some heuristics to figure out which addons are used.
- Delegate to dh_assistant to figure out which commands will be used and which debhelper config file stems it knows about. Here we need to know which sequences are in use from step one (if relevant). Combine this with any other sources for stems you have.
- Deconstruct all files in debian/ against the stems and known package names from debian/control. In theory, dumpster diving after --name options would be helpful here, but personally I skipped that part as I want to keep my debian/rules parsing to an absolute minimum.
With this logic, you can now:
- Provide typo detection of the stem (debian/foo.intsall -> debian/foo.install) provided to have adequate handling of the corner cases (such as debian/*.conf not needing correction into debian/*.config)
- Detect possible invalid package prefix (debian/foo.install without foo being a package). Note this has to be a weak warning unless the package is using debhelper compat 15 or you dumpster dived to validate that dh_install was not passed dh_install --name foo. Agreed, no one should do that, but they can and false positives are the worst kind of positives for a linting tool.
- With some limitations, detect files used without the relevant command being active. As an example, the some integration modes of debputy removes dh_install, so a debian/foo.install would not be used.
- Associate a given file with a given command to assist users with the documentation look up. Like debian/foo.user.service is related to dh_installsystemduser, so man dh_installsystemduser is a natural start for documentation.
I have added the logic for all these features in debputy though the documentation association is currently not in a user facing command. All the others are now diagnostics emitted by debputy in its editor support mode (debputy lsp server) or via debputy lint. In the editor mode, the diagnostics are currently associated with the package name in debian/control due to technical limitations of how the editor integration works.
Some of these features will the latest version of debhelper (moving target at times). Check with debputy lsp features for the Extra dh support feature, which will be enabled if you got all you need.
Note: The detection is currently (mostly) ignoring files with architecture restrictions. That might be lifted in the future. However, architecture restricted config files tend to be rare, so they were not a priority at this point. Additionally, debputy for technical reasons ignores stem typos with multiple matches. That sadly means that typos of debian/docs will often be unreported due to its proximity to debian/dirs and vice versa.
Diving a bit deeper on getting the stems
To get the stems, debputy has 3 primary sources:
- Its own plugins can provide packager provided files. These are only relevant if the package is using debputy.
- It is als possible to provide a debputy plugin that identifies packaging files (either static or named ones). Though in practice, we probably do not want people to roll their own debputy plugin for this purpose, since the detection only works if the plugin is installed. I have used this mechanism to have debhelper provide a debhelper-documentation plugin to enrich the auto-detected data and we can assume most people interested in this feature would have debhelper installed.
- It asks dh_assistant list-guessed-dh-config-files for config files, which is covered below.
The dh_assistant command uses the same logic as dh to identify the active add-ons and loads them. From there, it scans all commands mentioned in the sequence for the PROMISE: DH NOOP WITHOUT ...-hint and a new INTROSPECTABLE: CONFIG-FILES ...-hint. When these hints reference a packaging file (as an example, via pkgfile(foo)) then dh_assistant records that as a known packaging file for that helper.
Additionally, debhelper now also tracks commands that were removed from the sequence. Several of the dh_assistant subcommand now use this to enrich their (JSON) output with notes about these commands being known but not active.
The end result
With all of this work, you now get:
$ apt satisfy 'dh-debputy (>= 0.1.43~), debhelper (>= 13.16~), python3-lsprotocol, python3-levenshtein'
# For demo purposes, pull two known repos (feel free to use your own packages here)
$ git clone https://salsa.debian.org/debian/debhelper.git -b debian/13.16
$ git clone https://salsa.debian.org/debian/debputy.git -b debian/0.1.43
$ cd debhelper
$ mv debian/debhelper.install debian/debhelper.intsall
$ debputy lint
warning: File: debian/debhelper.intsall:1:0:1:0: The file "debian/debhelper.intsall" is likely a typo of "debian/debhelper.install"
File-level diagnostic
$ mv debian/debhelper.intsall debian/debhleper.install
$ debputy lint
warning: File: debian/debhleper.install:1:0:1:0: Possible typo in "debian/debhleper.install". Consider renaming the file to "debian/debhelper.debhleper.install" or "debian/debhelper.install" if it is intended for debhelper
File-level diagnostic
$ cd ../debputy
$ touch debian/install
$ debputy lint --no-warn-about-check-manifest
warning: File: debian/install:1:0:1:0: The file debian/install is related to a command that is not active in the dh sequence with the current addons
File-level diagnostic
As mentioned, you also get these diagnostics in the editor via the debputy lsp server feature. Here the diagnostics appear in debian/control over the package name for technical reasons.
The editor side still needs a bit more work. Notably, changes to the filename is not triggered automatically and will first be caught on the next change to debian/control. Likewise, changes to debian/rules to add --with to dh might also have some limitations depending on the editor. Saving both files and then triggering an edit of debian/control seems to work reliable but ideally it should not be that involved.
The debhelper side could also do with some work to remove the unnecessary support for the name segment with many file stems that do not need them and announce that to debputy.
Anyhow, it is still a vast improvement over the status quo that was "Why is my file silently ignored!?".