Improving Debian packaging in Kate

with tags debian debputy pdo

The other day, I noted that the emacs integration with debputy stopped working. After debugging for a while, I realized that emacs no longer sent the didOpen notification that is expected of it, which confused debputy. At this point, I was already several hours into the debugging and I noted there was some discussions on debian-devel about emacs and byte compilation not working. So I figured I would shelve the emacs problem for now.

But I needed an LSP capable editor and with my vi skills leaving much to be desired, I skipped out on vim-youcompleteme. Instead, I pulled out kate, which I had not been using for years. It had LSP support, so it would fine, right?

Well, no. Turns out that debputy LSP support had some assumptions that worked for emacs but not kate. Plus once you start down the rabbit hole, you stumble on things you missed previously.

Getting started

First order of business was to tell kate about debputy. Conveniently, kate has a configuration tab for adding language servers in a JSON format right next to the tab where you can see its configuration for built-in LSP (also in JSON format9. So a quick bit of copy-paste magic and that was done.

Yesterday, I opened an MR against upstream to have the configuration added (https://invent.kde.org/utilities/kate/-/merge_requests/1748) and they already merged it. Today, I then filed a wishlist against kate in Debian to have the Debian maintainers cherry-pick it, so it works out of the box for Trixie (https://bugs.debian.org/1099876).

So far so good.

Inlay hint woes

Since July (2024), debputy has support for Inlay hints. They are basically small bits of text that the LSP server can ask the editor to inject into the text to provide hints to the reader.

Typically, you see them used to provide typing hints, where the editor or the underlying LSP server has figured out the type of a variable or expression that you did not explicitly type. Another common use case is to inject the parameter name for positional arguments when calling a function, so the user do not have to count the position to figure out which value is passed as which parameter.

In debputy, I have been using the Inlay hints to show inherited fields in debian/control. As an example, if you have a definition like:

Source: foo-src
Section: devel
Priority: optional

Package: foo-bin
Architecture: any

Then foo-bin inherits the Section and Priority field since it does not supply its own. Previously, debputy would that by injecting the fields themselves and their value just below the Package field as if you had typed them out directly. The editor always renders Inlay hints distinctly from regular text, so there was no risk of confusion and it made the text look like a valid debian/control file end to end. The result looked something like:

Source: foo-src
Section: devel
Priority: optional

Package: foo-bin
Section: devel
Priority: optional
Architecture: any

With the second instances of Section and Priority being rendered differently than its surrendering (usually faded or colorlessly).

Unfortunately, kate did not like injecting Inlay hints with a newline in them, which was needed for this trick. Reading into the LSP specs, it says nothing about multi-line Inlay hints being a thing and I figured I would see this problem again with other editors if I left it be.

I ended up changing the Inlay hints to be placed at the end of the Package field and then included surrounding () for better visuals. So now, it looks like:

Source: foo-src
Section: devel
Priority: optional

Package: foo-bin  (Section: devel)  (Priority: optional)
Architecture: any

Unfortunately, it is no longer 1:1 with the underlying syntax which I liked about the previous one. But it works in more editors and is still explicit. I also removed the Inlay hint for the Homepage field. It takes too much space and I have yet to meet someone missing it in the binary stanza.

If you have any better ideas for how to render it, feel free to reach out to me.

Spurious completion and hover

As I was debugging the Inlay hints, I wanted to do a quick restart of debputy after each fix. Then I would trigger a small change to the document to ensure kate would request an update from debputy to render the Inlay hints with the new code.

The full outgoing payloads are sent via the logs to the client, so it was really about minimizing which LSP requests are sent to debputy. Notably, two cases would flood the log:

Completion requests. These are triggered by typing anything at all and since I wanted to a change, I could not avoid this. So here it was about making sure there would be nothing to complete, so the result was a small as possible.

Hover doc requests. These are triggered by mouse hovering over field, so this was mostly about ensuring my mouse movement did not linger over any field on the way between restarting the LSP server and scrolling the log in kate.

In my infinite wisdom, I chose to make a comment line where I would do the change. I figured it would neuter the completion requests completely and it should not matter if my cursor landed on the comment as there would be no hover docs for comments either.

Unfortunately for me, debputy would ignore the fact that it was on a comment line. Instead, it would find the next field after the comment line and try to complete based on that. Normally you do not see this, because the editor correctly identifies that none of the completion suggestions start with a \#, so they are all discarded.

But it was pretty annoying for the debugging, so now debputy has been told to explicitly stop these requests early on comment lines.

Hover docs for packages

I added a feature in debputy where you can hover over package names in your relationship fields (such as Depends) and debputy will render a small snippet about it based on data from your local APT cache.

This doc is then handed to the editor and tagged as markdown provided the editor supports markdown rendering. Both emacs and kate support markdown. However, not all markdown renderings are equal. Notably, emacs's rendering does not reformat the text into paragraphs. In a sense, emacs rendering works a bit like <pre>...</pre> except it does a bit of fancy rendering inside the <pre>...</pre>.

On the other hand, kate seems to convert the markdown to HTML and then throw the result into an HTML render engine. Here it is important to remember that not all newlines are equal in markdown. A Foo<newline>Bar is treated as one "paragraph" (<p>...</p>) and the HTML render happily renders this as single line Foo Bar provided there is sufficient width to do so.

A couple of extra newlines made wonders for the kate rendering, but I have a feeling this is not going to be the last time the hover docs will need some tweaking for prettification. Feel free to reach out if you spot a weirdly rendered hover doc somewhere.

Making quickfixes available in `kate`

Quickfixes are treated as generic code actions in the LSP specs. Each code action has a "type" (kind in the LSP lingo), which enables the editor to group the actions accordingly or filter by certain types of code actions.

The design in the specs leads to the following flow:

The LSP server provides the editor with diagnostics (there are multiple ways to trigger this, so we will keep this part simple).

The editor renders them to the user and the user chooses to interact with one of them.

The interaction makes the editor asks the LSP server, which code actions are available at that location (optionally with filter to only see quickfixes).

The LSP server looks at the provided range and is expected to return the relevant quickfixes here.

This flow is really annoying from a LSP server writer point of view. When you do the diagnostics (in step 1), you tend to already know what the possible quickfixes would be. The LSP spec authors realized this at some point, so there are two features the editor provides to simplify this.

In the editor request for code actions, the editor is expected to provide the diagnostics that they received from the server. Side note: I cannot quite tell if this is optional or required from the spec.

The editor can provide support for remembering a data member in each diagnostic. The server can then store arbitrary information in that member, which they will see again in the code actions request. Again, provided that the editor supports this optional feature.

All the quickfix logic in debputy so far has hinged on both of these two features.

As life would have it, kate provides neither of them.

Which meant I had to teach debputy to keep track of its diagnostics on its own. The plus side is that makes it easier to support "pull diagnostics" down the line, since it requires a similar feature. Additionally, it also means that quickfixes are now available in more editors. For consistency, debputy logic is now always used rather than relying on the editor support when present.

The downside is that I had to spend hours coming up with and debugging a way to find the diagnostics that overlap with the range provided by the editor. The most difficult part was keeping the logic straight and getting the runes correct for it.

Making the quickfixes actually work

With all of that, kate would show the quickfixes for diagnostics from debputy and you could use them too. However, they would always apply twice with suboptimal outcome as a result.

The LSP spec has multiple ways of defining what need to be changed in response to activating a code action. In debputy, all edits are currently done via the WorkspaceEdit type. It has two ways of defining the changes. Either via changes or documentChanges with documentChanges being the preferred one if both parties support this.

I originally read that as I was allowed to provide both and the editor would pick the one it preferred. However, after seeing kate blindly use both when they are present, I reviewed the spec and it does say "The edit should either provide changes or documentChanges", so I think that one is on me.

None of the changes in debputy currently require documentChanges, so I went with just using changes for now despite it not being preferred. I cannot figure out the logic of whether an editor supports documentChanges. As I read the notes for this part of the spec, my understanding is that kate does not announce its support for documentChanges but it clearly uses them when present. Therefore, I decided to keep it simple for now until I have time to dig deeper.

Remaining limitations with `kate`

There is one remaining limitation with kate that I have not yet solved. The kate program uses KSyntaxHighlighting for its language detection, which in turn is the basis for which LSP server is assigned to a given document.

This engine does not seem to support as complex detection logic as I hoped from it. Concretely, it either works on matching on an extension / a basename (same field for both cases) or mime type. This combined with our habit in Debian to use extension less files like debian/control vs. debian/tests/control or debian/rules or debian/upstream/metadata makes things awkward a best.

Concretely, the syntax engine cannot tell debian/control from debian/tests/control as they use the same basename. Fortunately, the syntax is close enough to work for both and debputy is set to use filename based lookups, so this case works well enough.

However, for debian/rules and debian/upstream/metadata, my understanding is that if I assign these in the syntax engine as Debian files, these rules will also trigger for any file named foo.rules or bar.metadata. That seems a bit too broad for me, so I have opted out of that for now. The down side is that these files will not work out of the box with kate for now.

The current LSP configuration in kate does not recognize makefiles or YAML either. Ideally, we would assign custom languages for the affected Debian files, so we do not steal the ID from other language servers. Notably, kate has a built-in language server for YAML and debputy does nothing for a generic YAML document. However, adding YAML as a supported language for debputy would cause conflict and regressions for users that are already happy with their generic YAML language server from kate.

So there are certainly still work to be done. If you are good with KSyntaxHighlighting and know how to solve some of this, I hope you will help me out.

Changes unrelated to `kate`

While I was working on debputy, I also added some other features that I want to mention.

The debputy lint command will now show related context to diagnostic in its terminal report when such information is available and is from the same file as the diagnostic itself (cross file cases are rendered without related information).

The related information is typically used to highlight a source of a conflict. As an example, if you use the same field twice in a stanza of debian/control, then debputy will add a diagnostic to the second occurrence. The related information for that diagnostic would provide the position of the first occurrence.

This should make it easier to find the source of the conflict in the cases where debputy provides it. Let me know if you are missing it for certain diagnostics.

The diagnostics analysis of debian/control will now identify and flag simple duplicated relations (complex ones like OR relations are ignored for now). Thanks to Matthias Geiger for suggesting the feature and Otto Kekäläinen for reporting a false positive that is now fixed.

Closing

I am glad I tested with kate to weed out most of these issues in time before the freeze. The Debian freeze will start within a week from now. Since debputy is a part of the toolchain packages it will be frozen from there except for important bug fixes.

Improving packaging file detection in Debian

with tags debian debputy pdo

Debian packaging consists of a directory (debian/) containing a number of "hard-coded" filenames such as debian/control, debian/changelog, debian/copyright. In addition to these, many packages will also use a number of optional files that are named via a pattern such as debian/{{PACKAGE}}.install.

At a high level the patterns looks deceptively simple. However, if you start working on trying to automatically classify files in debian/ (which could be helpful to tell the user they have a typo in the filename), you will quickly realize these patterns are not machine friendly at all for this purpose.

The patterns deconstructed

To appreciate the problem fully, here is a primer on the pattern and all its issues. If you are already well-versed in these, you might want to skip the section.

The most common patterns are debian/package.stem or debian/stem and usually the go to example is the install stem ( a concrete example being debian/debhelper.install). However, the full pattern consists of 4 parts where 3 of them are optional.

The package name followed by a period. Optional, but must be the first if present.

The name segment followed by a period. Optional, but must appear between the package name (if present) and the stem. If the package name is not present, then the name segment must be first.

The stem. Mandatory.

An architecture restriction prefixed by a period. Optional, must appear after the stem if present.

To visualize it with [foo] to mark optional parts, it looks like debian/[PACKAGE.][NAME.]STEM[.ARCH]

Detecting whether a given file is in fact a packaging file now boils down to reverse engineering its name against this pattern. Again, so far, it might still look manageable. One major complication is that every part (except ARCH) can contain periods. So a trivial "split by period" is not going to cut it. As an example:

debian/g++-3.0.user.service

This example is deliberately crafted to be ambiguous and show this problem in its full glory. This file name can be in multiple ways:

Is the stem service or user.service? (both are known stems from dh_installsystemd and dh_installsystemduser respectively). In fact, it can be both at the same time with "clever" usage of --name=user passed to dh_installsystemd.

The g++-3.0 can be a package prefix or part of the name segment. Even if there is a g++-3.0 package in debian/control, then debhelper (until compat 15) will still happily match this file for the main package if you pass --name=g++-3.0 to the helper. Side bar: Woe is you if there is a g++-3 and a g++-3.0 package in debian/control, then we have multiple options for the package prefix! Though, I do not think that happens in practice.

Therefore, there are a lot of possible ways to split this filename that all matches the pattern but with vastly different meaning and consequences.

Making detection practical

To make this detection practical, lets look at the first problems that we need to solve.

We need the possible stems up front to have a chance at all. When multiple stems are an option, go for the longest match (that is, the one with most periods) since --name is rare and "code golfing" is even rarer.

We can make the package prefix mandatory for files with the name segment. This way, the moment there is something before the stem, we know the package prefix will be part of it and can cut it. It does not solve the ambiguity if one package name is a prefix of another package name (from the same source), but it still a lot better. This made its way into debhelper compat 15 and now it is "just" a slow long way to a better future.

A simple solution to the first problem could be to have a static list of known stems. That will get you started but the debhelper eco-system strive on decentralization, so this feels like a mismatch.

There is also a second problem with the static list. Namely, a given stem is only "valid" if the command in question is actually in use. Which means you now need to dumpster dive into the mess that is Turning-complete debhelper configuration file known as debian/rules to fully solve that. Thanks to the Turning-completeness, we will never get a perfect solution for a static analysis.

Instead, it is time to back out and instead apply some simplifications. Here is a sample flow:

Check whether the dh sequencer is used. If so, use some heuristics to figure out which addons are used.

Delegate to dh_assistant to figure out which commands will be used and which debhelper config file stems it knows about. Here we need to know which sequences are in use from step one (if relevant). Combine this with any other sources for stems you have.

Deconstruct all files in debian/ against the stems and known package names from debian/control. In theory, dumpster diving after --name options would be helpful here, but personally I skipped that part as I want to keep my debian/rules parsing to an absolute minimum.

With this logic, you can now:

Provide typo detection of the stem (debian/foo.intsall -> debian/foo.install) provided to have adequate handling of the corner cases (such as debian/*.conf not needing correction into debian/*.config)

Detect possible invalid package prefix (debian/foo.install without foo being a package). Note this has to be a weak warning unless the package is using debhelper compat 15 or you dumpster dived to validate that dh_install was not passed dh_install --name foo. Agreed, no one should do that, but they can and false positives are the worst kind of positives for a linting tool.

With some limitations, detect files used without the relevant command being active. As an example, the some integration modes of debputy removes dh_install, so a debian/foo.install would not be used.

Associate a given file with a given command to assist users with the documentation look up. Like debian/foo.user.service is related to dh_installsystemduser, so man dh_installsystemduser is a natural start for documentation.

I have added the logic for all these features in debputy though the documentation association is currently not in a user facing command. All the others are now diagnostics emitted by debputy in its editor support mode (debputy lsp server) or via debputy lint. In the editor mode, the diagnostics are currently associated with the package name in debian/control due to technical limitations of how the editor integration works.

Some of these features will the latest version of debhelper (moving target at times). Check with debputy lsp features for the Extra dh support feature, which will be enabled if you got all you need.

Note: The detection is currently (mostly) ignoring files with architecture restrictions. That might be lifted in the future. However, architecture restricted config files tend to be rare, so they were not a priority at this point. Additionally, debputy for technical reasons ignores stem typos with multiple matches. That sadly means that typos of debian/docs will often be unreported due to its proximity to debian/dirs and vice versa.

Diving a bit deeper on getting the stems

To get the stems, debputy has 3 primary sources:

Its own plugins can provide packager provided files. These are only relevant if the package is using debputy.

It is als possible to provide a debputy plugin that identifies packaging files (either static or named ones). Though in practice, we probably do not want people to roll their own debputy plugin for this purpose, since the detection only works if the plugin is installed. I have used this mechanism to have debhelper provide a debhelper-documentation plugin to enrich the auto-detected data and we can assume most people interested in this feature would have debhelper installed.

It asks dh_assistant list-guessed-dh-config-files for config files, which is covered below.

The dh_assistant command uses the same logic as dh to identify the active add-ons and loads them. From there, it scans all commands mentioned in the sequence for the PROMISE: DH NOOP WITHOUT ...-hint and a new INTROSPECTABLE: CONFIG-FILES ...-hint. When these hints reference a packaging file (as an example, via pkgfile(foo)) then dh_assistant records that as a known packaging file for that helper.

Additionally, debhelper now also tracks commands that were removed from the sequence. Several of the dh_assistant subcommand now use this to enrich their (JSON) output with notes about these commands being known but not active.

The end result

With all of this work, you now get:

$ apt satisfy 'dh-debputy (>= 0.1.43~), debhelper (>= 13.16~), python3-lsprotocol, python3-levenshtein'
# For demo purposes, pull two known repos (feel free to use your own packages here)
$ git clone https://salsa.debian.org/debian/debhelper.git -b debian/13.16
$ git clone https://salsa.debian.org/debian/debputy.git -b debian/0.1.43
$ cd debhelper
$ mv debian/debhelper.install debian/debhelper.intsall
$ debputy lint
warning: File: debian/debhelper.intsall:1:0:1:0: The file "debian/debhelper.intsall" is likely a typo of "debian/debhelper.install"
    File-level diagnostic
$ mv debian/debhelper.intsall debian/debhleper.install
$ debputy lint
warning: File: debian/debhleper.install:1:0:1:0: Possible typo in "debian/debhleper.install". Consider renaming the file to "debian/debhelper.debhleper.install" or "debian/debhelper.install" if it is intended for debhelper
    File-level diagnostic
$ cd ../debputy
$ touch debian/install
$ debputy lint --no-warn-about-check-manifest
warning: File: debian/install:1:0:1:0: The file debian/install is related to a command that is not active in the dh sequence with the current addons
    File-level diagnostic

As mentioned, you also get these diagnostics in the editor via the debputy lsp server feature. Here the diagnostics appear in debian/control over the package name for technical reasons.

The editor side still needs a bit more work. Notably, changes to the filename is not triggered automatically and will first be caught on the next change to debian/control. Likewise, changes to debian/rules to add --with to dh might also have some limitations depending on the editor. Saving both files and then triggering an edit of debian/control seems to work reliable but ideally it should not be that involved.

The debhelper side could also do with some work to remove the unnecessary support for the name segment with many file stems that do not need them and announce that to debputy.

Anyhow, it is still a vast improvement over the status quo that was "Why is my file silently ignored!?".

Debian packaging with style black

with tags debian debputy pdo

When I started working on the language server for debputy, one of several reasons was about automatic applying a formatting style. Such that you would not have to remember to manually reformat the file.

One of the problems with supporting automatic formatting is that no one agrees on the "one true style". To make this concrete, Johannes Schauer Marin Rodrigues did the numbers of which wrap-and-sort option that are most common in https://bugs.debian.org/895570#46. Unsurprising, we end up with 14-15 different styles with various degrees of popularity. To make matters worse, wrap-and-sort does not provide a way to declare "this package uses options -sat".

So that begged the question, how would debputy know which style it should use when it was going to reformat file. After a couple of false-starts, Christian Hofstaedtler mentioned that we could just have a field in debian/control for supporting a "per-package" setting in responds to my concern about adding a new "per-package" config file.

At first, I was not happy with it, because how would you specify all of these options in a field (in a decent manner)? But then I realized that one I do not want all these styles and that I could start simpler. The Python code formatter black is quite successful despite not having a lot of personalized style options. In fact, black makes a statement out of not allowing a lot of different styles.

Combing that, the result was X-Style: black (to be added to the Source stanza of debian/control), which every possible reference to the black tool for how styling would work. Namely, you outsource the style management to the tool (debputy) and then start using your focus on something else than discussing styles.

As with black, this packaging formatting style is going to be opinionated and it will evolve over time. At the starting point, it is similar to wrap-and-sort -sat for the deb822 files (debputy does not reformat other files at the moment). But as mentioned, it will likely evolve and possible diverge from wrap-and-sort over time.

The choice of the starting point was based on the numbers posted by Johannes #895570. It was not my personal favorite but it seemed to have a majority and is also close to the one suggested by salsa pipeline maintainers. The delta being -kb which I had originally but removed in 0.1.34 at request of Otto Kekäläinen after reviewing the numbers from Johannes one more time.

To facilitate this new change, I uploaded debputy/0.1.30 (a while back) to Debian unstable with the following changes:

Support for the X-Style: black header.

When a style is defined, the debputy lsp server command will now automatically reformat deb822 files on save (if the editor supports it) or on explicit "reformat file" request from the editor (usually indirectly from the user).

New subcommand debputy reformat command that will reformat the files, when a style is defined.

A new pre-commit hook repo to run debputy lint and debputy reformat. These hooks are available from https://salsa.debian.org/debian/debputy-pre-commit-hooks version v0.1 and can be used with the pre-commit tool (from the package of same name).

The obvious omission is a salsa-pipeline feature for this. Otto has put that on to his personal todo list and I am looking forward to that.

Beyond black

Another thing I dislike about our existing style tooling is that if you run wrap-and-sort without any arguments, you have a higher probability of "trashing" the style of the current package than getting the desired result. Part of this is because wrap-and-sort's defaults are out of sync with the usage (which is basically what https://bugs.debian.org/895570 is about).

But I see another problem. The wrap-and-sort tool explicitly defined options to tweak the style but provided maintainers no way to record their preference in any machine readable way. The net result is that we have tons of diverging styles and that you (as a user of wrap-and-sort) have to manually tell wrap-and-sort which style you want every time you run the tool.

In my opinion that is not playing to the strengths of neither human nor machine. Rather, it is playing to the weaknesses of the human if anything at all.

But the salsa-CI pipeline people also ran into this issue and decided to work around this deficiency. To use wrap-and-sort in the salsa-CI pipeline, you have to set a variable to activate the job and another variable with the actual options you want.

The salsa-CI pipeline is quite machine readable and wrap-and-sort is widely used. I had debputy reformat also check for the salsa-CI variables as a fallback. This fallback also works for the editor mode (debputy lsp server), so you might not even have to run debputy reformat. :)

This was a deliberate trade-off. While I do not want all us to have all these options, I also want Debian packaging to be less painful and have fewer paper cuts. Having debputy go extra lengths to meet wrap-and-sort users where they are came out as the better solution for me.

A nice side-effect of this trade-off is that debputy reformat now a good tool for drive-by contributors. You can safely run debputy reformat on any package and either it will apply the styling or it will back out and inform you that no obvious style was detected. In the latter case, you would have to fallback to manually deducing the style and applying it.

Differences to `wrap-and-sort`

The debputy reformat has some limitations or known differences to wrap-and-sort. Notably, debputy reformat (nor debputy lsp server) will not invoke wrap-and-sort. Instead, debputy has its own reformatting engine that provides similar features.

One reason for not running wrap-and-sort is that I want debputy reformat to match the style that debputy lsp server will give you. That way, you get consistent style across all debputy commands.

Another reason is that it is important to me that reformatting is safe and does not change semantics. This leads to two regrettable known differences to the wrap-and-sort behavior due to safety in addition to one scope limitation in debputy:

debputy will ignore requests to sort the stanzas when the "keep first" option is disabled (-b --no-keep-first). This combination is unsafe reformatting. I feel it was a mistake for wrap-and-sort to ever allow this but at least it is no longer the default (-b is now -bk by default). This will be less of a problem in debhelper-compat 15, since the concept of "main package" will disappear and all multi-binary source packages will be required to use debian/package.install rather than debian/install.

debputy will not reorder the contents of debhelper packaging files such as debian/install. This is also an (theoretical) unsafe thing to do. While the average package will not experience issues with this, there are rare corner cases where the re-ordering can affect the end result. I happen to know this, because I ran into issues when trying to optimize dh_install in a way that assumed the order did not matter. Stuff broke and there is now special-case code in dh_install to back out of that optimization when that happens.

debputy has a limited list of wrap-and-sort options it understands. Some options may cause debputy to back out and disable reformatting entirely with a remark that it cannot apply that style. If you run into a case of this, feel free to file a feature request to support it. I will not promise to support everything, but if it is safe and trivially doable with the engine already, then I probably will.

As stated, where debputy cannot implement the wrap-and-sort styles fully, then it will currently implement a subset that is safe if that can be identified or back out entirely of the formatting when it cannot. In all cases, debputy will not break the formatting if it is correct. It may just fail at correcting one aspect of the wrap-and-sort style if you happen to get it wrong.

It is also important to remember that the prerequisite for debputy applying any wrap-and-sort style is that you have set the salsa-CI pipeline variables to trigger wrap-and-sort with the salsa-CI pipeline. So there is still a CI check before the merge that will run the wrap-and-sort in its full glory that provides the final safety net for you.

Just give me a style

In conclusion, if you, like me, are more interested in getting a consistent style rather than discussing what that style should be, now you can get that with X-Style: black. You can also have your custom wrap-and-sort style be picked up automatically for drive-by contributors.

$ apt satisfy 'dh-debputy (>= 0.1.30), python3-lsprotocol'

# Add ``X-Style: black`` to ``debian/control`` for "just give me a style"
#
# OR, if there is a specific ``wrap-and-sort`` style for you then set
# SALSA_CI_DISABLE_WRAP_AND_SORT=no plus set relevant options in
# SALSA_CI_WRAP_AND_SORT_ARGS in debian/salsa-ci.yml (or .gitlab-ci.yml)

$ debputy reformat

It is sadly not yet in the salsa-ci pipeline. Otto is looking into that and hopefully we will have it soon. :)

And if you find yourself often doing archive-wide contributions and is tired of having to reverse engineer package formatting styles, consider using debputy reformat or debputy lsp server. If you use debputy in this way, please consider providing feedback on what would help you.

debputy v0.1.21

with tags debian debputy pdo

Earlier today, I have just released debputy version 0.1.21 to Debian unstable. In the blog post, I will highlight some of the new features.

Package boilerplate reduction with automatic relationship substvar

Last month, I started a discussion on rethinking how we do relationship substvars such as the ${misc:Depends}. These generally ends up being boilerplate runes in the form of Depends: ${misc:Depends}, ${shlibs:Depends} where you as the packager has to remember exactly which runes apply to your package.

My proposed solution was to automatically apply these substvars and this feature has now been implemented in debputy. It is also combined with the feature where essential packages should use Pre-Depends by default for dpkg-shlibdeps related dependencies.

I am quite excited about this feature, because I noticed with libcleri that we are now down to 3-5 fields for defining a simple library package. Especially since most C library packages are trivial enough that debputy can auto-derive them to be Multi-Arch: same.

As an example, the libcleric1 package is down to 3 fields (Package, Architecture, Description) with Section and Priority being inherited from the Source stanza. I have submitted a MR to show case the boilerplate reduction at https://salsa.debian.org/siridb-team/libcleri/-/merge_requests/3.

The removal of libcleric1 (= ${binary:Version}) in that MR relies on another existing feature where debputy can auto-derive a dependency between an arch:any -dev package and the library package based on the .so symlink for the shared library. The arch:any restriction comes from the fact that arch:all and arch:any packages are not built together, so debputy cannot reliably see across the package boundaries during the build (and therefore refuses to do so at all).

Packages that have already migrated to debputy can use debputy migrate-from-dh to detect any unnecessary relationship substitution variables in case you want to clean up. The removal of Multi-Arch: same and intra-source dependencies must be done manually and so only be done so when you have validated that it is safe and sane to do. I was willing to do it for the show-case MR, but I am less confident that would bother with these for existing packages in general.

Note: I summarized the discussion of the automatic relationship substvar feature earlier this month in https://lists.debian.org/debian-devel/2024/03/msg00030.html for those who want more details.

PS: The automatic relationship substvars feature will also appear in debhelper as a part of compat 14.

Language Server (LSP) and Linting

I have long been frustrated by our poor editor support for Debian packaging files. To this end, I started working on a Language Server (LSP) feature in debputy that would cover some of our standard Debian packaging files. This release includes the first version of said language server, which covers the following files:

debian/control

debian/copyright (the machine readable variant)

debian/changelog (mostly just spelling)

debian/rules

debian/debputy.manifest (syntax checks only; use debputy check-manifest for the full validation for now)

Most of the effort has been spent on the Deb822 based files such as debian/control, which comes with diagnostics, quickfixes, spellchecking (but only for relevant fields!), and completion suggestions.

Since not everyone has a LSP capable editor and because sometimes you just want diagnostics without having to open each file in an editor, there is also a batch version for the diagnostics via debputy lint. Please see debputy(1) for how debputy lint compares with lintian if you are curious about which tool to use at what time.

To help you getting started, there is a now debputy lsp editor-config command that can provide you with the relevant editor config glue. At the moment, emacs (via eglot) and vim with vim-youcompleteme are supported.

For those that followed the previous blog posts on writing the language server, I would like to point out that the command line for running the language server has changed to debputy lsp server and you no longer have to tell which format it is. I have decided to make the language server a "polyglot" server for now, which I will hopefully not regret... Time will tell. :)

Anyhow, to get started, you will want:

$ apt satisfy 'dh-debputy (>= 0.1.21~), python3-pygls'
# Optionally, for spellchecking
$ apt install python3-hunspell hunspell-en-us
# For emacs integration
$ apt install elpa-dpkg-dev-el markdown-mode-el
# For vim integration via vim-youcompleteme
$ apt install vim-youcompleteme

Specifically for emacs, I also learned two things after the upload. First, you can auto-activate eglot via eglot-ensure. This badly feature interacts with imenu on debian/changelog for reasons I do not understand (causing a several second start up delay until something times out), but it works fine for the other formats. Oddly enough, opening a changelog file and then activating eglot does not trigger this issue at all. In the next version, editor config for emacs will auto-activate eglot on all files except debian/changelog.

The second thing is that if you install elpa-markdown-mode, emacs will accept and process markdown in the hover documentation provided by the language server. Accordingly, the editor config for emacs will also mention this package from the next version on.

Finally, on a related note, Jelmer and I have been looking at moving some of this logic into a new package called debpkg-metadata. The point being to support easier reuse of linting and LSP related metadata - like pulling a list of known fields for debian/control or sharing logic between lintian-brush and debputy.

Minimal integration mode for Rules-Requires-Root

One of the original motivators for starting debputy was to be able to get rid of fakeroot in our build process. While this is possible, debputy currently does not support most of the complex packaging features such as maintscripts and debconf. Unfortunately, the kind of packages that need fakeroot for static ownership tend to also require very complex packaging features.

To bridge this gap, the new version of debputy supports a very minimal integration with dh via the dh-sequence-zz-debputy-rrr. This integration mode keeps the vast majority of debhelper sequence in place meaning most dh add-ons will continue to work with dh-sequence-zz-debputy-rrr. The sequence only replaces the following commands:

dh_fixperms

dh_gencontrol

dh_md5sums

dh_builddeb

The installations feature of the manifest will be disabled in this integration mode to avoid feature interactions with debhelper tools that expect debian/<pkg> to contain the materialized package.

On a related note, the debputy migrate-from-dh command now supports a --migration-target option, so you can choose the desired level of integration without doing code changes. The command will attempt to auto-detect the desired integration from existing package features such as a build-dependency on a relevant dh sequence, so you do not have to remember this new option every time once the migration has started. :)

Language Server for Debian: Spellchecking

with tags debian debputy pdo

This is my third update on writing a language server for Debian packaging files, which aims at providing a better developer experience for Debian packagers.

Lets go over what have done since the last report.

Semantic token support

I have added support for what the Language Server Protocol (LSP) call semantic tokens. These are used to provide the editor insights into tokens of interest for users. Allegedly, this is what editors would use for syntax highlighting as well.

Unfortunately, eglot (emacs) does not support semantic tokens, so I was not able to test this. There is a 3-year old PR for supporting with the last update being ~3 month basically saying "Please sign the Copyright Assignment". I pinged the GitHub issue in the hopes it will get unstuck.

For good measure, I also checked if I could try it via neovim. Before installing, I read the neovim docs, which helpfully listed the features supported. Sadly, I did not spot semantic tokens among those and parked from there.

That was a bit of a bummer, but I left the feature in for now. If you have an LSP capable editor that supports semantic tokens, let me know how it works for you! :)

Spellchecking

Finally, I implemented something Otto was missing! :)

This stared with Paul Wise reminding me that there were Python binding for the hunspell spellchecker. This enabled me to get started with a quick prototype that spellchecked the Description fields in debian/control. I also added spellchecking of comments while I was add it.

The spellchecker runs with the standard en_US dictionary from hunspell-en-us, which does not have a lot of technical terms in it. Much less any of the Debian specific slang. I spend considerable time providing a "built-in" wordlist for technical and Debian specific slang to overcome this. I also made a "wordlist" for known Debian people that the spellchecker did not recognise. Said wordlist is fairly short as a proof of concept, and I fully expect it to be community maintained if the language server becomes a success.

My second problem was performance. As I had suspected that spellchecking was not the fastest thing in the world. Therefore, I added a very small language server for the debian/changelog, which only supports spellchecking the textual part. Even for a small changelog of a 1000 lines, the spellchecking takes about 5 seconds, which confirmed my suspicion. With every change you do, the existing diagnostics hangs around for 5 seconds before being updated. Notably, in emacs, it seems that diagnostics gets translated into an absolute character offset, so all diagnostics after the change gets misplaced for every character you type.

Now, there is little I could do to speed up hunspell. But I can, as always, cheat. The way diagnostics work in the LSP is that the server listens to a set of notifications like "document opened" or "document changed". In a response to that, the LSP can start its diagnostics scanning of the document and eventually publish all the diagnostics to the editor. The spec is quite clear that the server owns the diagnostics and the diagnostics are sent as a "notification" (that is, fire-and-forgot). Accordingly, there is nothing that prevents the server from publishing diagnostics multiple times for a single trigger. The only requirement is that the server publishes the accumulated diagnostics in every publish (that is, no delta updating).

Leveraging this, I had the language server for debian/changelog scan the document and publish once for approximately every 25 typos (diagnostics) spotted. This means you quickly get your first result and that clears the obsolete diagnostics. Thereafter, you get frequent updates to the remainder of the document if you do not perform any further changes. That is, up to a predefined max of typos, so we do not overload the client for longer changelogs. If you do any changes, it resets and starts over.

The only bit missing was dealing with concurrency. By default, a pygls language server is single threaded. It is not great if the language server hangs for 5 seconds everytime you type anything. Fortunately, pygls has builtin support for asyncio and threaded handlers. For now, I did an async handler that await after each line and setup some manual detection to stop an obsolete diagnostics run. This means the server will fairly quickly abandon an obsolete run.

Also, as a side-effect of working on the spellchecking, I fixed multiple typos in the changelog of debputy. :)

Follow up on the "What next?" from my previous update

In my previous update, I mentioned I had to finish up my python-debian changes to support getting the location of a token in a deb822 file. That was done, the MR is now filed, and is pending review. Hopefully, it will be merged and uploaded soon. :)

I also submitted my proposal for a different way of handling relationship substvars to debian-devel. So far, it seems to have received only positive feedback. I hope it stays that way and we will have this feature soon. Guillem proposed to move some of this into dpkg, which might delay my plans a bit. However, it might be for the better in the long run, so I will wait a bit to see what happens on that front. :)

As noted above, I managed to add debian/changelog as a support format for the language server. Even if it only does spellchecking and trimming of trailing newlines on save, it technically is a new format and therefore cross that item off my list. :D

Unfortunately, I did not manage to write a linter variant that does not involve using an LSP-capable editor. So that is still pending. Instead, I submitted an MR against elpa-dpkg-dev-el to have it recognize all the fields that the debian/control LSP knows about at this time to offset the lack of semantic token support in eglot.

From here...

My sprinting on this topic will soon come to an end, so I have to a bit more careful now with what tasks I open!

I think I will narrow my focus to providing a batch linting interface. Ideally, with an auto-fix for some of the more mechanical issues, where this is little doubt about the answer.

Additionally, I think the spellchecking will need a bit more maturing. My current code still trips on naming patterns that are "clearly" verbatim or code references like things written in CamelCase or SCREAMING_SNAKE_CASE. That gets annoying really quickly. It also trips on a lot of commands like dpkg-gencontrol, but that is harder to fix since it could have been a real word. I think those will have to be fixed people using quotes around the commands. Maybe the most popular ones will end up in the wordlist.

Beyond that, I will play it by ear if I have any time left. :)

Expanding on the Language Server (LSP) support for debian/control

with tags debian debputy pdo

I have spent some more time on improving my language server for debian/control. Today, I managed to provide the following features:

The X- style prefixes for field names are now understood and handled. This means the language server now considers XC-Package-Type the same as Package-Type.

More diagnostics:

Fields without values now trigger an error marker

Duplicated fields now trigger an error marker

Fields used in the wrong paragraph now trigger an error marker

Typos in field names or values now trigger a warning marker. For field names, X- style prefixes are stripped before typo detection is done.

The value of the Section field is now validated against a dataset of known sections and trigger a warning marker if not known.

The "on-save trim end of line whitespace" now works. I had a logic bug in the server side code that made it submit "no change" edits to the editor.

The language server now provides "hover" documentation for field names. There is a small screenshot of this below. Sadly, emacs does not support markdown or, if it does, it does not announce the support for markdown. For now, all the documentation is always in markdown format and the language server will tag it as either markdown or plaintext depending on the announced support.

The language server now provides quick fixes for some of the more trivial problems such as deprecated fields or typos of fields and values.

Added more known fields including the XS-Autobuild field for non-free packages along with a link to the relevant devref section in its hover doc.

This covers basically all my known omissions from last update except spellchecking of the Description field.

An image of emacs showing documentation for the Provides field from the language server.

Spellchecking

Personally, I feel spellchecking would be a very welcome addition to the current feature set. However, reviewing my options, it seems that most of the spellchecking python libraries out there are not packaged for Debian, or at least not other the name I assumed they would be.

The alternative is to pipe the spellchecking to another program like aspell list. I did not test this fully, but aspell list does seem to do some input buffering that I cannot easily default (at least not in the shell). Though, either way, the logic for this will not be trivial and aspell list does not seem to include the corrections either. So best case, you would get typo markers but no suggestions for what you should have typed. Not ideal.

Additionally, I am also concerned with the performance for this feature. For d/control, it will be a trivial matter in practice. However, I would be reusing this for d/changelog which is 99% free text with plenty of room for typos. For a regular linter, some slowness is acceptable as it is basically a batch tool. However, for a language server, this potentially translates into latency for your edits and that gets annoying.

While it is definitely on my long term todo list, I am a bit afraid that it can easily become a time sink. Admittedly, this does annoy me, because I wanted to cross off at least one of Otto's requested features soon.

On wrap-and-sort support

The other obvious request from Otto would be to automate wrap-and-sort formatting. Here, the problem is that "we" in Debian do not agree on the one true formatting of debian/control. In fact, I am fairly certain we do not even agree on whether we should all use wrap-and-sort. This implies we need a style configuration.

However, if we have a style configuration per person, then you get style "ping-pong" for packages where the co-maintainers do not all have the same style configuration. Additionally, it is very likely that you are a member of multiple packaging teams or groups that all have their own unique style. Ergo, only having a personal config file is doomed to fail.

The only "sane" option here that I can think of is to have or support "per package" style configuration. Something that would be committed to git, so the tooling would automatically pick up the configuration. Obviously, that is not fun for large packaging teams where you have to maintain one file per package if you want a consistent style across all packages. But it beats "style ping-pong" any day of the week.

Note that I am perfectly open to having a personal configuration file as a fallback for when the "per package" configuration file is absent.

The second problem is the question of which format to use and what to name this file. Since file formats and naming has never been controversial at all, this will obviously be the easy part of this problem. But the file should be parsable by both wrap-and-sort and the language server, so you get the same result regardless of which tool you use. If we do not ensure this, then we still have the style ping-pong problem as people use different tools.

This also seems like time sink with no end. So, what next then...?

What next?

On the language server front, I will have a look at its support for providing semantic hints to the editors that might be used for syntax highlighting. While I think most common Debian editors have built syntax highlighting already, I would like this language server to stand on its own. I would like us to be in a situation where we do not have implement yet another editor extension for Debian packaging files. At least not for editors that support the LSP spec.

On a different front, I have an idea for how we go about relationship related substvars. It is not directly related to this language server, except I got triggered by the language server "missing" a diagnostic for reminding people to add the magic Depends: ${misc:Depends}[, ${shlibs:Depends}] boilerplate. The magic boilerplate that you have to write even though we really should just fix this at a tooling level instead. Energy permitting, I will formulate a proposal for that and send it to debian-devel.

Beyond that, I think I might start adding support for another file. I also need to wrap up my python-debian branch, so I can get the position support into the Debian soon, which would remove one papercut for using this language server.

Finally, it might be interesting to see if I can extract a "batch-linter" version of the diagnostics and related quickfix features. If nothing else, the "linter" variant would enable many of you to get a "mini-Lintian" without having to do a package build first.

Language Server (LSP) support for debian/control

with tags debian debputy pdo

About a month ago, Otto Kekäläinen asked for editor extensions for debian related files on the debian-devel mailing list. In that thread, I concluded that what we were missing was a "Language Server" (LSP) for our packaging files.

Last week, I started a prototype for such a LSP for the debian/control file as a starting point based on the pygls library. The initial prototype worked and I could do very basic diagnostics plus completion suggestion for field names.

Current features

I got 4 basic features implemented, though I have only been able to test two of them in emacs.

Diagnostics or linting of basic issues.

Completion suggestions for all known field names that I could think of and values for some fields.

Folding ranges (untested). This feature enables the editor to "fold" multiple lines. It is often used with multi-line comments and that is the feature currently supported.

On save, trim trailing whitespace at the end of lines (untested). Might not be registered correctly on the server end.

Despite its very limited feature set, I feel editing debian/control in emacs is now a much more pleasant experience.

Coming back to the features that Otto requested, the above covers a grand total of zero. Sorry, Otto. It is not you, it is me.

Completion suggestions

For completion, all known fields are completed. Place the cursor at the start of the line or in a partially written out field name and trigger the completion in your editor. In my case, I can type R-R-R and trigger the completion and the editor will automatically replace it with Rules-Requires-Root as the only applicable match. Your milage may vary since I delegate most of the filtering to the editor, meaning the editor has the final say about whether your input matches anything.

The only filtering done on the server side is that the server prunes out fields already used in the paragraph, so you are not presented with the option to repeat an already used field, which would be an error. Admittedly, not an error the language server detects at the moment, but other tools will.

When completing field, if the field only has one non-default value such as Essential which can be either no (the default, but you should not use it) or yes, then the completion suggestion will complete the field along with its value.

This is mostly only applicable for "yes/no" fields such as Essential and Protected. But it does also trigger for Package-Type at the moment.

As for completing values, here the language server can complete the value for simple fields such as "yes/no" fields, Multi-Arch, Package-Type and Priority. I intend to add support for Section as well - maybe also Architecture.

Diagnostics

On the diagnostic front, I have added multiple diagnostics:

An error marker for syntax errors.

An error marker for missing a mandatory field like Package or Architecture. This also includes Standards-Version, which is admittedly mandatory by policy rather than tooling falling part.

An error marker for adding Multi-Arch: same to an Architecture: all package.

Error marker for providing an unknown value to a field with a set of known values. As an example, writing foo in Multi-Arch would trigger this one.

Warning marker for using deprecated fields such as DM-Upload-Allowed, or when setting a field to its default value for fields like Essential. The latter rule only applies to selected fields and notably Multi-Arch: no does not trigger a warning.

Info level marker if a field like Priority duplicates the value of the Source paragraph.

Notable omission at this time:

No errors are raised if a field does not have a value.

No errors are raised if a field is duplicated inside a paragraph.

No errors are used if a field is used in the wrong paragraph.

No spellchecking of the Description field.

No understanding that Foo and X[CBS]-Foo are related. As an example, XC-Package-Type is completely ignored despite being the old name for Package-Type.

Quick fixes to solve these problems... :)

Trying it out

If you want to try, it is sadly a bit more involved due to things not being uploaded or merged yet. Also, be advised that I will regularly rebase my git branches as I revise the code.

The setup:

Build and install the deb of the main branch of pygls from https://salsa.debian.org/debian/pygls The package is in NEW and hopefully this step will soon just be a regular apt install.

Build and install the deb of the rts-locatable branch of my python-debian fork from https://salsa.debian.org/nthykier/python-debian There is a draft MR of it as well on the main repo.

Build and install the deb of the lsp-support branch of debputy from https://salsa.debian.org/debian/debputy

Configure your editor to run debputy lsp debian/control as the language server for debian/control. This is depends on your editor. I figured out how to do it for emacs (see below). I also found a guide for neovim at https://neovim.io/doc/user/lsp. Note that debputy can be run from any directory here. The debian/control is a reference to the file format and not a concrete file in this case.

Obviously, the setup should get easier over time. The first three bullet points should eventually get resolved by merges and upload meaning you end up with an apt install command instead of them.

For the editor part, I would obviously love it if we can add snippets for editors to make the automatically pick up the language server when the relevant file is installed.

Using the debputy LSP in emacs

The guide I found so far relies on eglot. The guide below assumes you have the elpa-dpkg-dev-el package installed for the debian-control-mode. Though it should be a trivially matter to replace debian-control-mode with a different mode if you use a different mode for your debian/control file.

In your emacs init file (such as ~/.emacs or ~/.emacs.d/init.el), you add the follow blob.

(with-eval-after-load 'eglot
    (add-to-list 'eglot-server-programs
        '(debian-control-mode . ("debputy" "lsp" "debian/control"))))

Once you open the debian/control file in emacs, you can type M-x eglot to activate the language server. Not sure why that manual step is needed and if someone knows how to automate it such that eglot activates automatically on opening debian/control, please let me know.

For testing completions, I often have to manually activate them (with C-M-i or M-x complete-symbol). Though, it is a bit unclear to me whether this is an emacs setting that I have not toggled or something I need to do on the language server side.

From here

As next steps, I will probably look into fixing some of the "known missing" items under diagnostics. The quick fix would be a considerable improvement to assisting users.

In the not so distant future, I will probably start to look at supporting other files such as debian/changelog or look into supporting configuration, so I can cover formatting features like wrap-and-sort.

I am also very much open to how we can provide integrations for this feature into editors by default. I will probably create a separate binary package for specifically this feature that pulls all relevant dependencies that would be able to provide editor integrations as well.

Annotating the Debian packaging directory

with tags debian debputy pdo

In my previous blog post Providing online reference documentation for debputy, I made a point about how debhelper documentation was suboptimal on account of being static rather than online. The thing is that debhelper is not alone in this problem space, even if it is a major contributor to the number of packaging files you have to to know about.

If we look at the "competition" here such as Fedora and Arch Linux, they tend to only have one packaging file. While most Debian people will tell you a long list of cons about having one packaging file (such a Fedora's spec file being 3+ domain specific languages "mashed" into one file), one major advantage is that there is only "the one packaging file". You only need to remember where to find the documentation for one file, which is great when you are running on wetware with limited storage capacity.

Which means as a newbie, you can dedicate less mental resources to tracking multiple files and how they interact and more effort understanding the "one file" at hand. I started by asking myself how can we in Debian make the packaging stack more accessible to newcomers? Spoiler alert, I dug myself into rabbit hole and ended up somewhere else than where I thought I was going.

I started by wanting to scan the debian directory and annotate all files that I could with documentation links. The logic was that if debputy could do that for you, then you could spend more mental effort elsewhere. So I combined debputy's packager provided files detection with a static list of files and I quickly had a good starting point for debputy-based packages.

Adding (non-static) dpkg and debhelper files to the mix

Now, I could have closed the topic here and said "Look, I did debputy files plus couple of super common files". But I decided to take it a bit further. I added support for handling some dpkg files like packager provided files (such as debian/substvars and debian/symbols). But even then, we all know that debhelper is the big hurdle and a major part of the omission...

In another previous blog post (A new Debian package helper: debputy), I made a point about how debputy could list all auxiliary files while debhelper could not. This was exactly the kind of feature that I would need for this feature, if this feature was to cover debhelper. Now, I also remarked in that blog post that I was not willing to maintain such a list. Also, I may have ranted about static documentation being unhelpful for debhelper as it excludes third-party provided tooling.

Fortunately, a recent update to dh_assistant had provided some basic plumbing for loading dh sequences. This meant that getting a list of all relevant commands for a source package was a lot easier than it used to be. Once you have a list of commands, it would be possible to check all of them for dh's NOOP PROMISE hints. In these hints, a command can assert it does nothing if a given pkgfile is not present. This lead to the new dh_assistant list-guessed-dh-config-files command that will list all declared pkgfiles and which helpers listed them.

With this combined feature set in place, debputy could call dh_assistant to get a list of pkgfiles, pretend they were packager provided files and annotate those along with manpage for the relevant debhelper command. The exciting thing about letting debpputy resolve the pkgfiles is that debputy will resolve "named" files automatically (debhelper tools will only do so when --name is passed), so it is much more likely to detect named pkgfiles correctly too. Side note: I am going to ignore the elephant in the room for now, which is dh_installsystemd and its package@.service files and the wide-spread use of debian/foo.service where there is no package called foo. For the latter case, the "proper" name would be debian/pkg.foo.service.

With the new dh_assistant feature done and added to debputy, debputy could now detect the ubiquitous debian/install file. Excellent. But less great was that the very common debian/docs file was not. Turns out that dh_installdocs cannot be skipped by dh, so it cannot have NOOP PROMISE hints. Meh...

Well, dh_assistant could learn about a new INTROSPECTABLE marker in addition to the NOOP PROMISE and then I could sprinkle that into a few commands. Indeed that worked and meant that debian/postinst (etc.) are now also detectable.

At this point, debputy would be able to identify a wide range of debhelper related configuration files in debian/ and at least associate each of them with one or more commands.

Nice, surely, this would be a good place to stop, right...?

Adding more metadata to the files

The debhelper detected files only had a command name and manpage URI to that command. It would be nice if we could contextualize this a bit more.

Like is this file installed into the package as is like debian/pam or is it a file list to be processed like debian/install. To make this distinction, I could add the most common debhelper file types to my static list and then merge the result together.

Except, I do not want to maintain a full list in debputy. Fortunately, debputy has a quite extensible plugin infrastructure, so added a new plugin feature to provide this kind of detail and now I can outsource the problem! I split my definitions into two and placed the generic ones in the debputy-documentation plugin and moved the debhelper related ones to debhelper-documentation. Additionally, third-party dh addons could provide their own debputy plugin to add context to their configuration files.

So, this gave birth file categories and configuration features, which described each file on different fronts. As an example, debian/gbp.conf could be tagged as a maint-config to signal that it is not directly related to the package build but more of a tool or style preference file. On the other hand, debian/install and debian/debputy.manifest would both be tagged as a pkg-helper-config. Files like debian/pam were tagged as ppf-file for packager provided file and so on.

I mentioned configuration features above and those were added because, I have had a beef with debhelper's "standard" configuration file format as read by filearray and filedoublearray. They are often considered simple to understand, but it is hard to know how a tool will actually read the file. As an example, consider the following:

Will the debhelper use filearray, filedoublearray or none of them to read the file? This topic has about 2 bits of entropy.

Will the config file be executed if it is marked executable assuming you are using the right compat level? If it is executable, does dh-exec allow renaming for this file? This topic adds 1 or 2 bit of entropy depending on the context.

Will the config file be subject to glob expansions? This topic sounds like a boolean but is a complicated mess. The globs can be handled either by debhelper as it parses the file for you. In this case, the globs are applied to every token. However, this is not what dh_install does. Here the last token on each line is supposed to be a directory and therefore not subject to globs. Therefore, dh_install does the globbing itself afterwards but only on part of the tokens. So that is about 2 bits of entropy more. Actually, it gets worse...

If the file is executed, debhelper will refuse to expand globs in the output of the command, which was a deliberate design choice by the original debhelper maintainer took when he introduced the feature in debhelper/8.9.12. Except, dh_install feature interacts with the design choice and does enable glob expansion in the tool output, because it does so manually after its filedoublearray call.

So these "simple" files have way too many combinations of how they can be interpreted. I figured it would be helpful if debputy could highlight these difference, so I added support for those as well.

Accordingly, debian/install is tagged with multiple tags including dh-executable-config and dh-glob-after-execute. Then, I added a datatable of these tags, so it would be easy for people to look up what they meant.

Ok, this seems like a closed deal, right...?

Context, context, context

However, the dh-executable-config tag among other are only applicable in compat 9 or later. It does not seem newbie friendly if you are told that this feature exist, but then have to read in the extended description that that it actually does not apply to your package.

This problem seems fixable. Thanks to dh_assistant, it is easy to figure out which compat level the package is using. Then tweak some metadata to enable per compat level rules. With that tags like dh-executable-config only appears for packages using compat 9 or later.

Also, debputy should be able to tell you where packager provided files like debian/pam are installed. We already have the logic for packager provided files that debputy supports and I am already using debputy engine for detecting the files. If only the plugin provided metadata gave me the install pattern, debputy would be able tell you where this file goes in the package. Indeed, a bit of tweaking later and setting install-pattern to usr/lib/pam.d/{name}, debputy presented me with the correct install-path with the package name placing the {name} placeholder.

Now, I have been using debian/pam as an example, because debian/pam is installed into usr/lib/pam.d in compat 14. But in earlier compat levels, it was installed into etc/pam.d. Well, I already had an infrastructure for doing compat file tags. Off we go to add install-pattern to the complat level infrastructure and now changing the compat level would change the path. Great. (Bug warning: The value is off-by-one in the current version of debhelper. This is fixed in git)

Also, while we are in this install-pattern business, a number of debhelper config files causes files to be installed into a fixed directory. Like debian/docs which causes file to be installed into /usr/share/docs/{package}. Surely, we can expand that as well and provide that bit of context too... and done. (Bug warning: The code currently does not account for the main documentation package context)

It is rather common pattern for people to do debian/foo.in files, because they want to custom generation of debian/foo. Which means if you have debian/foo you get "Oh, let me tell you about debian/foo ". Then you rename it to debian/foo.in and the result is "debian/foo.in is a total mystery to me!". That is suboptimal, so lets detect those as well as if they were the original file but add a tag saying that they are a generate template and which file we suspect it generates.

Finally, if you use debputy, almost all of the standard debhelper commands are removed from the sequence, since debputy replaces them. It would be weird if these commands still contributed configuration files when they are not actually going to be invoked. This mostly happened naturally due to the way the underlying dh_assistant command works. However, any file mentioned by the debhelper-documentation plugin would still appear unfortunately. So off I went to filter the list of known configuration files against which dh_ commands that dh_assistant thought would be used for this package.

Wrapping it up

I was several layers into this and had to dig myself out. I have ended up with a lot of data and metadata. But it was quite difficult for me to arrange the output in a user friendly manner.

However, all this data did seem like it would be useful any tool that wants to understand more about the package. So to get out of the rabbit hole, I for now wrapped all of this into JSON and now we have a debputy tool-support annotate-debian-directory command that might be useful for other tools.

To try it out, you can try the following demo:

In another day, I will figure out how to structure this output so it is useful for non-machine consumers. Suggestions are welcome. :)

Limitations of the approach

As a closing remark, I should probably remind people that this feature relies heavily on declarative features. These include:

When determining which commands are relevant, using Build-Depends: dh-sequence-foo is much more reliable than configuring it via the Turing complete configuration we call debian/rules.

When debhelper commands use NOOP promise hints, dh_assistant can "see" the config files listed those hints, meaning the file will at least be detected. For new introspectable hint and the debputy plugin, it is probably better to wait until the dust settles a bit before adding any of those.

You can help yourself and others to better results by using the declarative way rather than using debian/rules, which is the bane of all introspection!

Making debputy: Writing declarative parsing logic

with tags debian debputy pdo

In this blog post, I will cover how debputy parses its manifest and the conceptual improvements I did to make parsing of the manifest easier.

All instructions to debputy are provided via the debian/debputy.manifest file and said manifest is written in the YAML format. After the YAML parser has read the basic file structure, debputy does another pass over the data to extract the information from the basic structure. As an example, the following YAML file:

manifest-version: "0.1"
installations:
  - install:
      source: foo
      dest-dir: usr/bin

would be transformed by the YAML parser into a structure resembling:

{
  "manifest-version": "0.1",
  "installations": [
     {
       "install": {
         "source": "foo",
         "dest-dir": "usr/bin",
       }
     }
  ]
}

This structure is then what debputy does a pass on to translate this into an even higher level format where the "install" part is translated into an InstallRule.

In the original prototype of debputy, I would hand-write functions to extract the data that should be transformed into the internal in-memory high level format. However, it was quite tedious. Especially because I wanted to catch every possible error condition and report "You are missing the required field X at Y" rather than the opaque KeyError: X message that would have been the default.

Beyond being tedious, it was also quite error prone. As an example, in debputy/0.1.4 I added support for the install rule and you should allegedly have been able to add a dest-dir: or an as: inside it. Except I crewed up the code and debputy was attempting to look up these keywords from a dict that could never have them.

Hand-writing these parsers were so annoying that it demotivated me from making manifest related changes to debputy simply because I did not want to code the parsing logic. When I got this realization, I figured I had to solve this problem better.

While reflecting on this, I also considered that I eventually wanted plugins to be able to add vocabulary to the manifest. If the API was "provide a callback to extract the details of whatever the user provided here", then the result would be bad.

Most plugins would probably throw KeyError: X or ValueError style errors for quite a while. Worst case, they would end on my table because the user would have a hard time telling where debputy ends and where the plugins starts. "Best" case, I would teach debputy to say "This poor error message was brought to you by plugin foo. Go complain to them". Either way, it would be a bad user experience.

This even assumes plugin providers would actually bother writing manifest parsing code. If it is that difficult, then just providing a custom file in debian might tempt plugin providers and that would undermine the idea of having the manifest be the sole input for debputy.

So beyond me being unsatisfied with the current situation, it was also clear to me that I needed to come up with a better solution if I wanted externally provided plugins for debputy. To put a bit more perspective on what I expected from the end result:

It had to cover as many parsing errors as possible. An error case this code would handle for you, would be an error where I could ensure it sufficient degree of detail and context for the user.

It should be type-safe / provide typing support such that IDEs/mypy could help you when you work on the parsed result.

It had to support "normalization" of the input, such as

           # User provides
           - install: "foo"
           # Which is normalized into:
           - install:
               source: "foo"


4) It must be simple to tell ``debputy`` what input you expected.

At this point, I remembered that I had seen a Python (PYPI) package where you could give it a TypedDict and an arbitrary input (Sadly, I do not remember the name). The package would then validate the said input against the TypedDict. If the match was successful, you would get the result back casted as the TypedDict. If the match was unsuccessful, the code would raise an error for you. Conceptually, this seemed to be a good starting point for where I wanted to be.

Then I looked a bit on the normalization requirement (point 3). What is really going on here is that you have two "schemas" for the input. One is what the programmer will see (the normalized form) and the other is what the user can input (the manifest form). The problem is providing an automatic normalization from the user input to the simplified programmer structure. To expand a bit on the following example:

# User provides
- install: "foo"
# Which is normalized into:
- install:
    source: "foo"

Given that install has the attributes source, sources, dest-dir, as, into, and when, how exactly would you automatically normalize "foo" (str) into source: "foo"? Even if the code filtered by "type" for these attributes, you would end up with at least source, dest-dir, and as as candidates. Turns out that TypedDict actually got this covered. But the Python package was not going in this direction, so I parked it here and started looking into doing my own.

At this point, I had a general idea of what I wanted. When defining an extension to the manifest, the plugin would provide debputy with one or two definitions of TypedDict. The first one would be the "parsed" or "target" format, which would be the normalized form that plugin provider wanted to work on. For this example, lets look at an earlier version of the install-examples rule:

# Example input matching this typed dict.
#   {
#       "source": ["foo"]
#       "into": ["pkg"]
#   }
class InstallExamplesTargetFormat(TypedDict):
    # Which source files to install (dest-dir is fixed)
    sources: List[str]
    # Which package(s) that should have these files installed.
    into: NotRequired[List[str]]

In this form, the install-examples has two attributes - both are list of strings. On the flip side, what the user can input would look something like this:

# Example input matching this typed dict.
#   {
#       "source": "foo"
#       "into": "pkg"
#   }
#
class InstallExamplesManifestFormat(TypedDict):
    # Note that sources here is split into source (str) vs. sources (List[str])
    sources: NotRequired[List[str]]
    source: NotRequired[str]
    # We allow the user to write ``into: foo`` in addition to ``into: [foo]``
    into: Union[str, List[str]]

FullInstallExamplesManifestFormat = Union[
    InstallExamplesManifestFormat,
    List[str],
    str,
]

The idea was that the plugin provider would use these two definitions to tell debputy how to parse install-examples. Pseudo-registration code could look something like:

def _handler(
    normalized_form: InstallExamplesTargetFormat,
) -> InstallRule:
    ...  # Do something with the normalized form and return an InstallRule.

concept_debputy_api.add_install_rule(
  keyword="install-examples",
  target_form=InstallExamplesTargetFormat,
  manifest_form=FullInstallExamplesManifestFormat,
  handler=_handler,
)

This was my conceptual target and while the current actual API ended up being slightly different, the core concept remains the same.

From concept to basic implementation

Building this code is kind like swallowing an elephant. There was no way I would just sit down and write it from one end to the other. So the first prototype of this did not have all the features it has now.

Spoiler warning, these next couple of sections will contain some Python typing details. When reading this, it might be helpful to know things such as Union[str, List[str]] being the Python type for either a str (string) or a List[str] (list of strings). If typing makes your head spin, these sections might less interesting for you.

To build this required a lot of playing around with Python's introspection and typing APIs. My very first draft only had one "schema" (the normalized form) and had the following features:

Read TypedDict.__required_attributes__ and TypedDict.__optional_attributes__ to determine which attributes where present and which were required. This was used for reporting errors when the input did not match.

Read the types of the provided TypedDict, strip the Required / NotRequired markers and use basic isinstance checks based on the resulting type for str and List[str]. Again, used for reporting errors when the input did not match.

This prototype did not take a long (I remember it being within a day) and worked surprisingly well though with some poor error messages here and there. Now came the first challenge, adding the manifest format schema plus relevant normalization rules. The very first normalization I did was transforming into: Union[str, List[str]] into into: List[str]. At that time, source was not a separate attribute. Instead, sources was a Union[str, List[str]], so it was the only normalization I needed for all my use-cases at the time.

There are two problems when writing a normalization. First is determining what the "source" type is, what the target type is and how they relate. The second is providing a runtime rule for normalizing from the manifest format into the target format. Keeping it simple, the runtime normalizer for Union[str, List[str]] -> List[str] was written as:

def normalize_into_list(x: Union[str, List[str]]) -> List[str]:
    return x if isinstance(x, list) else [x]

This basic form basically works for all types (assuming none of the types will have List[List[...]]). The logic for determining when this rule is applicable is slightly more involved. My current code is about 100 lines of Python code that would probably lose most of the casual readers. For the interested, you are looking for _union_narrowing in declarative_parser.py

With this, when the manifest format had Union[str, List[str]] and the target format had List[str] the generated parser would silently map a string into a list of strings for the plugin provider.

But with that in place, I had covered the basics of what I needed to get started. I was quite excited about this milestone of having my first keyword parsed without handwriting the parser logic (at the expense of writing a more generic parse-generator framework).

Adding the first parse hint

With the basic implementation done, I looked at what to do next. As mentioned, at the time sources in the manifest format was Union[str, List[str]] and I considered to split into a source: str and a sources: List[str] on the manifest side while keeping the normalized form as sources: List[str]. I ended up committing to this change and that meant I had to solve the problem getting my parser generator to understand the situation:

# Map from
class InstallExamplesManifestFormat(TypedDict):
    # Note that sources here is split into source (str) vs. sources (List[str])
    sources: NotRequired[List[str]]
    source: NotRequired[str]
    # We allow the user to write ``into: foo`` in addition to ``into: [foo]``
    into: Union[str, List[str]]

# ... into
class InstallExamplesTargetFormat(TypedDict):
    # Which source files to install (dest-dir is fixed)
    sources: List[str]
    # Which package(s) that should have these files installed.
    into: NotRequired[List[str]]

There are two related problems to solve here:

How will the parser generator understand that source should be normalized and then mapped into sources?

Once that is solved, the parser generator has to understand that while source and sources are declared as NotRequired, they are part of a exactly one of rule (since sources in the target form is Required). This mainly came down to extra book keeping and an extra layer of validation once the previous step is solved.

While working on all of this type introspection for Python, I had noted the Annotated[X, ...] type. It is basically a fake type that enables you to attach metadata into the type system. A very random example:

# For all intents and purposes, ``foo`` is a string despite all the ``Annotated`` stuff.
foo: Annotated[str, "hello world"] = "my string here"

The exciting thing is that you can put arbitrary details into the type field and read it out again in your introspection code. Which meant, I could add "parse hints" into the type. Some "quick" prototyping later (a day or so), I got the following to work:

# Map from
#     {
#        "source": "foo"  # (or "sources": ["foo"])
#        "into": "pkg"
#     }
class InstallExamplesManifestFormat(TypedDict):
    # Note that sources here is split into source (str) vs. sources (List[str])
    sources: NotRequired[List[str]]
    source: NotRequired[
        Annotated[
            str,
            DebputyParseHint.target_attribute("sources")
        ]
    ]
    # We allow the user to write ``into: foo`` in addition to ``into: [foo]``
    into: Union[str, List[str]]

# ... into
#     {
#        "source": ["foo"]
#        "into": ["pkg"]
#     }
class InstallExamplesTargetFormat(TypedDict):
    # Which source files to install (dest-dir is fixed)
    sources: List[str]
    # Which package(s) that should have these files installed.
    into: NotRequired[List[str]]

Without me (as a plugin provider) writing a line of code, I can have debputy rename or "merge" attributes from the manifest form into the normalized form. Obviously, this required me (as the debputy maintainer) to write a lot code so other me and future plugin providers did not have to write it.

High level typing

At this point, basic normalization between one mapping to another mapping form worked. But one thing irked me with these install rules. The into was a list of strings when the parser handed them over to me. However, I needed to map them to the actual BinaryPackage (for technical reasons). While I felt I was careful with my manual mapping, I knew this was exactly the kind of case where a busy programmer would skip the "is this a known package name" check and some user would typo their package resulting in an opaque KeyError: foo.

Side note: "Some user" was me today and I was super glad to see debputy tell me that I had typoed a package name (I would have been more happy if I had remembered to use debputy check-manifest, so I did not have to wait through the upstream part of the build that happened before debhelper passed control to debputy...)

I thought adding this feature would be simple enough. It basically needs two things:

Conversion table where the parser generator can tell that BinaryPackage requires an input of str and a callback to map from str to BinaryPackage. (That is probably lie. I think the conversion table came later, but honestly I do remember and I am not digging into the git history for this one)

At runtime, said callback needed access to the list of known packages, so it could resolve the provided string.

It was not super difficult given the existing infrastructure, but it did take some hours of coding and debugging. Additionally, I added a parse hint to support making the into conditional based on whether it was a single binary package. With this done, you could now write something like:

# Map from
class InstallExamplesManifestFormat(TypedDict):
    # Note that sources here is split into source (str) vs. sources (List[str])
    sources: NotRequired[List[str]]
    source: NotRequired[
        Annotated[
            str,
            DebputyParseHint.target_attribute("sources")
        ]
    ]
    # We allow the user to write ``into: foo`` in addition to ``into: [foo]``
    into: Union[BinaryPackage, List[BinaryPackage]]

# ... into
class InstallExamplesTargetFormat(TypedDict):
    # Which source files to install (dest-dir is fixed)
    sources: List[str]
    # Which package(s) that should have these files installed.
    into: NotRequired[
        Annotated[
            List[BinaryPackage],
            DebputyParseHint.required_when_multi_binary()
        ]
    ]

Code-wise, I still had to check for into being absent and providing a default for that case (that is still true in the current codebase - I will hopefully fix that eventually). But I now had less room for mistakes and a standardized error message when you misspell the package name, which was a plus.

The added side-effect - Introspection

A lovely side-effect of all the parsing logic being provided to debputy in a declarative form was that the generated parser snippets had fields containing all expected attributes with their types, which attributes were required, etc. This meant that adding an introspection feature where you can ask debputy "What does an install rule look like?" was quite easy. The code base already knew all of this, so the "hard" part was resolving the input the to concrete rule and then rendering it to the user.

I added this feature recently along with the ability to provide online documentation for parser rules. I covered that in more details in my blog post Providing online reference documentation for debputy in case you are interested. :)

Wrapping it up

This was a short insight into how debputy parses your input. With this declarative technique:

The parser engine handles most of the error reporting meaning users get most of the errors in a standard format without the plugin provider having to spend any effort on it. There will be some effort in more complex cases. But the common cases are done for you.

It is easy to provide flexibility to users while avoiding having to write code to normalize the user input into a simplified programmer oriented format.

The parser handles mapping from basic types into higher forms for you. These days, we have high level types like FileSystemMode (either an octal or a symbolic mode), different kind of file system matches depending on whether globs should be performed, etc. These types includes their own validation and parsing rules that debputy handles for you.

Introspection and support for providing online reference documentation. Also, debputy checks that the provided attribute documentation covers all the attributes in the manifest form. If you add a new attribute, debputy will remind you if you forget to document it as well. :)

In this way everybody wins. Yes, writing this parser generator code was more enjoyable than writing the ad-hoc manual parsers it replaced. :)

Providing online reference documentation for debputy

with tags debian debputy pdo

I do not think seasoned Debian contributors quite appreciate how much knowledge we have picked up and internalized. As an example, when I need to look up documentation for debhelper, I generally know which manpage to look in. I suspect most long time contributors would be able to a similar thing (maybe down 2-3 manpages). But new contributors does not have the luxury of years of experience. This problem is by no means unique to debhelper.

One thing that debhelper does very well, is that it is hard for users to tell where a addon "starts" and debhelper "ends". It is clear you use addons, but the transition in and out of third party provided tools is generally smooth. This is a sign that things "just work(tm)".

Except when it comes to documentation. Here, debhelper's static documentation does not include documentation for third party tooling. If you think from a debhelper maintainer's perspective, this seems obvious. Embedding documentation for all the third-party code would be very hard work, a layer-violation, etc.. But from a user perspective, we should not have to care "who" provides "what". As as user, I want to understand how this works and the more hoops I have to jump through to get that understanding, the more frustrated I will be with the toolstack.

With this, I came to the conclusion that the best way to help users and solve the problem of finding the documentation was to provide "online documentation". It should be possible to ask debputy, "What attributes can I use in install-man?" or "What does path-metadata do?". Additionally, the lookup should work the same no matter if debputy provided the feature or some third-party plugin did. In the future, perhaps also other types of documentation such as tutorials or how-to guides.

Below, I have some tentative results of my work so far. There are some improvements to be done. Notably, the commands for these documentation features are still treated a "plugin" subcommand features and should probably have its own top level "ask-me-anything" subcommand in the future.

Automatic discard rules

Since the introduction of install rules, debputy has included an automatic filter mechanism that prunes out unwanted content. In 0.1.9, these filters have been named "Automatic discard rules" and you can now ask debputy to list them.

$ debputy plugin list automatic-discard-rules
+-----------------------+-------------+
| Name                  | Provided By |
+-----------------------+-------------+
| python-cache-files    | debputy     |
| la-files              | debputy     |
| backup-files          | debputy     |
| version-control-paths | debputy     |
| gnu-info-dir-file     | debputy     |
| debian-dir            | debputy     |
| doxygen-cruft-files   | debputy     |
+-----------------------+-------------+

For these rules, the provider can both provide a description but also an example of their usage.

$ debputy plugin show automatic-discard-rules la-files
Automatic Discard Rule: la-files
================================

Documentation: Discards any .la files beneath /usr/lib

Example
-------

    /usr/lib/libfoo.la        << Discarded (directly by the rule)
    /usr/lib/libfoo.so.1.0.0

The example is a live example. That is, the provider will provide debputy with a scenario and the expected outcome of that scenario. Here is the concrete code in debputy that registers this example:

api.automatic_discard_rule(
    "la-files",
    _debputy_prune_la_files,
    rule_reference_documentation="Discards any .la files beneath /usr/lib",
    examples=automatic_discard_rule_example(
        "usr/lib/libfoo.la",
        ("usr/lib/libfoo.so.1.0.0", False),
    ),
)

When showing the example, debputy will validate the example matches what the plugin provider intended. Lets say I was to introduce a bug in the code, so that the discard rule no longer worked. Then debputy would start to show the following:

# Output if the code or example is broken
$ debputy plugin show automatic-discard-rules la-files
[...]
Automatic Discard Rule: la-files
================================

Documentation: Discards any .la files beneath /usr/lib

Example
-------

    /usr/lib/libfoo.la        !! INCONSISTENT (code: keep, example: discard)
    /usr/lib/libfoo.so.1.0.0
debputy: warning: The example was inconsistent. Please file a bug against the plugin debputy

Obviously, it would be better if this validation could be added directly as a plugin test, so the CI pipeline would catch it. That is one my personal TODO list. :)

One final remark about automatic discard rules before moving on. In 0.1.9, debputy will also list any path automatically discarded by one of these rules in the build output to make sure that the automatic discard rule feature is more discoverable.

Plugable manifest rules like the `install` rule

In the manifest, there are several places where rules can be provided by plugins. To make life easier for users, debputy can now since 0.1.8 list all provided rules:

$ debputy plugin list plugable-manifest-rules
+-------------------------------+------------------------------+-------------+
| Rule Name                     | Rule Type                    | Provided By |
+-------------------------------+------------------------------+-------------+
| install                       | InstallRule                  | debputy     |
| install-docs                  | InstallRule                  | debputy     |
| install-examples              | InstallRule                  | debputy     |
| install-doc                   | InstallRule                  | debputy     |
| install-example               | InstallRule                  | debputy     |
| install-man                   | InstallRule                  | debputy     |
| discard                       | InstallRule                  | debputy     |
| move                          | TransformationRule           | debputy     |
| remove                        | TransformationRule           | debputy     |
| [...]                         | [...]                        | [...]       |
| remove                        | DpkgMaintscriptHelperCommand | debputy     |
| rename                        | DpkgMaintscriptHelperCommand | debputy     |
| cross-compiling               | ManifestCondition            | debputy     |
| can-execute-compiled-binaries | ManifestCondition            | debputy     |
| run-build-time-tests          | ManifestCondition            | debputy     |
| [...]                         | [...]                        | [...]       |
+-------------------------------+------------------------------+-------------+

(Output trimmed a bit for space reasons)

And you can then ask debputy to describe any of these rules:

$ debputy plugin show plugable-manifest-rules install
Generic install (`install`)
===========================

The generic `install` rule can be used to install arbitrary paths into packages
and is *similar* to how `dh_install` from debhelper works.  It is a two "primary" uses.

  1) The classic "install into directory" similar to the standard `dh_install`
  2) The "install as" similar to `dh-exec`'s `foo => bar` feature.

Attributes:
 - `source` (conditional): string
   `sources` (conditional): List of string

   A path match (`source`) or a list of path matches (`sources`) defining the
   source path(s) to be installed. [...]

 - `dest-dir` (optional): string

   A path defining the destination *directory*. [...]

 - `into` (optional): string or a list of string

   A path defining the destination *directory*. [...]

 - `as` (optional): string

   A path defining the path to install the source as. [...]

 - `when` (optional): manifest condition (string or mapping of string)

   A condition as defined in [Conditional rules](https://salsa.debian.org/debian/debputy/-/blob/main/MANIFEST-FORMAT.md#Conditional rules).


This rule enforces the following restrictions:
 - The rule must use exactly one of: `source`, `sources`
 - The attribute `as` cannot be used with any of: `dest-dir`, `sources`

[...]

(Output trimmed a bit for space reasons)

All the attributes and restrictions are auto-computed by debputy from information provided by the plugin. The associated documentation for each attribute is supplied by the plugin itself, The debputy API validates that all attributes are covered and the documentation does not describe non-existing fields. This ensures that you as a plugin provider never forget to document new attributes when you add them later.

The debputy API for manifest rules are not quite stable yet. So currently only debputy provides rules here. However, it is my intention to lift that restriction in the future.

I got the idea of supporting online validated examples when I was building this feature. However, sadly, I have not gotten around to supporting it yet.

Manifest variables like `{{PACKAGE}}`

I also added a similar documentation feature for manifest variables such as {{PACKAGE}}. When I implemented this, I realized listing all manifest variables by default would probably be counter productive to new users. As an example, if you list all variables by default it would include DEB_HOST_MULTIARCH (the most common case) side-by-side with the the much less used DEB_BUILD_MULTIARCH and the even lessor used DEB_TARGET_MULTIARCH variable. Having them side-by-side implies they are of equal importance, which they are not. As an example, the ballpark number of unique packages for which DEB_TARGET_MULTIARCH is useful can be counted on two hands (and maybe two feet if you consider gcc-X distinct from gcc-Y).

This is one of the cases, where experience makes us blind. Many of us probably have the "show me everything and I will find what I need" mentality. But that requires experience to be able to pull that off - especially if all alternatives are presented as equals. The cross-building terminology has proven to notoriously match poorly to people's expectation.

Therefore, I took a deliberate choice to reduce the list of shown variables by default and in the output explicitly list what filters were active. In the current version of debputy (0.1.9), the listing of manifest-variables look something like this:

$ debputy plugin list manifest-variables
+----------------------------------+----------------------------------------+------+-------------+
| Variable (use via: `{{ NAME }}`) | Value                                  | Flag | Provided by |
+----------------------------------+----------------------------------------+------+-------------+
| DEB_HOST_ARCH                    | amd64                                  |      | debputy     |
| [... other DEB_HOST_* vars ...]  | [...]                                  |      | debputy     |
| DEB_HOST_MULTIARCH               | x86_64-linux-gnu                       |      | debputy     |
| DEB_SOURCE                       | debputy                                |      | debputy     |
| DEB_VERSION                      | 0.1.8                                  |      | debputy     |
| DEB_VERSION_EPOCH_UPSTREAM       | 0.1.8                                  |      | debputy     |
| DEB_VERSION_UPSTREAM             | 0.1.8                                  |      | debputy     |
| DEB_VERSION_UPSTREAM_REVISION    | 0.1.8                                  |      | debputy     |
| PACKAGE                          | <package-name>                         |      | debputy     |
| path:BASH_COMPLETION_DIR         | /usr/share/bash-completion/completions |      | debputy     |
+----------------------------------+----------------------------------------+------+-------------+

+-----------------------+--------+-------------------------------------------------------+
| Variable type         | Value  | Option                                                |
+-----------------------+--------+-------------------------------------------------------+
| Token variables       | hidden | --show-token-variables OR --show-all-variables        |
| Special use variables | hidden | --show-special-case-variables OR --show-all-variables |
+-----------------------+--------+-------------------------------------------------------+

I will probably tweak the concrete listing in the future. Personally, I am considering to provide short-hands variables for some of the DEB_HOST_* variables and then hide the DEB_HOST_* group from the default view as well. Maybe something like ARCH and MULTIARCH, which would default to their DEB_HOST_* counter part. This variable could then have extended documentation that high lights DEB_HOST_<X> as its source and imply that there are special cases for cross-building where you might need DEB_BUILD_<X> or DEB_TARGET_<X>.

Speaking of variable documentation, you can also lookup the documentation for a given manifest variable:

$ debputy plugin show manifest-variables path:BASH_COMPLETION_DIR
Variable: path:BASH_COMPLETION_DIR
==================================

Documentation: Directory to install bash completions into
Resolved: /usr/share/bash-completion/completions
Plugin: debputy

This was my update on online reference documentation for debputy. I hope you found it useful. :)

Thanks

On a closing note, I would like to thanks Jochen Sprickerhof, Andres Salomon, Paul Gevers for their recent contributions to debputy. Jochen and Paul provided a number of real world cases where debputy would crash or not work, which have now been fixed. Andres and Paul also provided corrections to the documentation.

A new Debian package helper: debputy

with tags debian debputy

I have made a new helper for producing Debian packages called debputy. Today, I uploaded it to Debian unstable for the first time. This enables others to migrate their package build using dh +debputy rather than the "classic" dh. Eventually, I hope to remove dh entirely from this equation, so you only need debputy. But for now, debputy still leverages dh support for managing upstream build systems.

The debputy tool takes a radically different approach to packaging compared to our existing packaging methods by using a single highlevel manifest instead of all the debian/install (etc.) and no "hook targets" in debian/rules.

Here are some of the things that debputy can do or does:

debputy can perform installation similar to dh_install, dh_installdocs (etc.) plus a bit of the dh-exec support. Notably, debputy supports "install /usr/bin/foo into foo" and "install everything else in /usr/bin into foo-utils" without you having to resort to weird tricks. With debhelper, this would require dh-exec's => /usr/bin/foo operator.
debputy can assign mode to files without needing hooks and static file ownership can be assigned without resorting to fakeroot. If you request Rules-Requires-Root: no, debputy will assemble the deb without using fakeroot. The fragileness of fakeroot may some day just be a horror story we tell our unruly children that they do not really believe is true.
debputy defaults to all scripts with a "#! /bin/tool" or "#! /usr/bin/tool" to have mode 0755 (a+x) unless they are placed in directories known not to have executable files (such as the perl module dirs). As an example, scripts in the examples directory may now get an automatic executable bit if they have a proper #!-line for /usr/bin or /bin.
debputy supports the default flow of 48 debhelper tools. If you are using pure dh $@ with no sequence add-ons and no hook targets in debian/rules (or only hook targets for the upstream side), odds are that debputy got your needs covered.

There are also some features that debputy does not support at the moment:

Almost any debhelper sequence add-on. The debputy tool comes with a migration tool that will auto-detect any unsupported dh add-on from Build-Depends and will flag them as potential problematic. The migration tool works from an list of approved add-ons. Note that manually activated add-ons via dh $@ --with ... are not detected.
Anything that installs or recently installed into /lib or another /usr-merged location. My life is too short to be directly involved in the /usr-merge transition. This means no udev and no systemd unit support at the moment (note tmpfiles and sysusers is supported). For the systemd side, I am also contemplating a different model for how to deal with services. Even without the /usr-merge transition, these would not have been supported. The migration tool will detect problematic file in the debian directory immediately and debputy will detect upstream provided systemd unit files at build time.
There is also no support for packager provided maintscript files at this time. If you have your own maintscripts, then you will not be able to migrate. The migration tool detects the debhelper based path and warns you (such as debian/postinst).
Additionally, if you need special cases in tools (such as perl-base dependency with dh_perl) or rely on dh_strip-nondeterminsm for reproducibility, then you cannot or is advised not to migrate at this time.

There are all limitations of the current work in progress. I hope to resolve them all in due time.

Trying debputy

With the limitations aside, lets talk about how you would go about migrating a package:

# Assuming here you have already run: apt install dh-debputy
$ git clone https://salsa.debian.org/rra/kstart
[...]
$ cd kstart
# Add a Build-Dependency on dh-sequence-debputy
$ perl -n -i -e \
   'print; print " dh-sequence-debputy,\n" if m/debhelper-compat/;' \
    debian/control
$ debputy migrate-from-dh --apply-changes
debputy: info: Loading plugin debputy (version: archive/debian/4.3-1) ...
debputy: info: Verifying the generating manifest
debputy: info: Updated manifest debian/debputy.manifest
debputy: info: Removals:
debputy: info:   rm -f "./debian/docs"
debputy: info:   rm -f "./debian/examples"

debputy: info: Migrations performed successfully

debputy: info: Remember to validate the resulting binary packages after rebuilding with debputy
$ cat debian/debputy.manifest
manifest-version: '0.1'
installations:
- install-docs:
    sources:
    - NEWS
    - README
    - TODO
- install-examples:
    source: examples/krenew-agent
$ git add debian/debputy.manifest
$ git commit --signoff -am"Migrate to debputy"
# Run build tool of choice to verify the output.

This is of course a specific example that works out of the box. If you were to try this on debianutils (from git), the output would look something like this:

$ debputy migrate-from-dh
debputy: info: Loading plugin debputy (version: 5.13-13-g9836721) ...
debputy: error: Unable to migrate automatically due to missing features in debputy.

  * The "debian/triggers" debhelper config file (used by dh_installdeb is currently not supported by debputy.

Use --acceptable-migration-issues=[...] to convert this into a warning [...]

And indeed, debianutils requires at least 4 debhelper features beyond what debputy can support at the moment (all related to maintscripts and triggers).

Rapid feedback

Rapid feedback cycles are important for keeping developers engaged in their work. The debputy tool provides the following features to enable rapid feedback.

Immediate manifest validation

It would be absolutely horrible if you had to do a full-rebuild only to realize you got the manifest syntax wrong. Therefore, debputy has a check-manifest command that checks the manifest for syntactical and semantic issues.

The debputy check-manifest command is limited to the manifest itself and does not warn about foo not existing as it could be produced as apart of the upstream build system. Therefore, there are still issues that can only be detected at package build time. But where debputy can reliably give you immediate feedback, it will do so.

Idempotence: Clean re-runs of dh_debputy without clean/rebuild

If you read the "fine print" of many debhelper commands, you may see the following note their manpage:

This command is not idempotent. dh_prep(1) should be called between

invocations of this command ...

Manpage of an anonymous debhelper tool

What this usually means, is that if you run the command twice, you will get its maintscript change (etc.) twice in the final deb. This fits into our "single-use clean throw-away chroot builds" on the buildds and CI as well as dpkg-buildpackage's "no-clean" (-nc) option. Single-use throw-away chroots are not very helpful for debugging though, so I rarely use them when doing the majority of my packaging work as I do not want to wait for the chroot initialization (including installing of build-depends).

But even then, I have found that dpkg-buildpackage -nc has been useless for me in many cases as I am stuck between two options:

With -nc, you often still interact with the upstream build system. As an example, debhelper will do a dh_prep followed by dh_auto_install, so now we are waiting for upstream's install target to run again. What should have taken seconds now easily take 0.5-1 minute extra per attempt.
If you want to by-pass this, you have to manually call the helpers needed (in correct order) and every run accumulates cruft from previous runs to the point that cruft drowns out the actual change you want to see. Also, I am rarely in the mood to play human dh, when I am debugging an issue that I failed to fix in my first, second and third try.

As you can probably tell, neither option has worked that well for me. But with dh_debputy, I have made it a goal that it will not "self-taint" the final output. If dh_debputy fails, you should be able to tweak the manifest and re-run dh_debputy with the same arguments.

No waiting for dpkg-buildpackage -nc nor anything implied by that.
No "self-tainting" of the final deb. The result you get, is the result you would have gotten if the previous dh_debputy run never happened.
Because dh_debputy produces the final result, I do not have to run multiple tools in "the right" order.

Obviously, this is currently a lot easier, because debputy is not involved in the upstream build system at all. If this feature is useful to you, please do let me know and I will try to preserve it as debputy progresses in features.

Packager provided files

On a different topic, have you ever wondered what kind of files you can place into the debian directory that debhelper automatically picks up or reacts too? I do not have an answer to that beyond it is over 80 files and that as the maintainer of debhelper, I am not willing to manually maintain such a list manually.

However, I do know what the answer is in debputy, because I can just ask debputy:

$ debputy plugin list packager-provided-files
+-----------------------------+---------------------------------------------[...]
| Stem                        | Installed As                                [...]
+-----------------------------+---------------------------------------------[...]
| NEWS                        | /usr/share/doc/{name}/NEWS.Debian           [...]
| README.Debian               | /usr/share/doc/{name}/README.Debian         [...]
| TODO                        | /usr/share/doc/{name}/TODO.Debian           [...]
| bug-control                 | /usr/share/bug/{name}/control               [...]
| bug-presubj                 | /usr/share/bug/{name}/presubj               [...]
| bug-script                  | /usr/share/bug/{name}/script                [...]
| changelog                   | /usr/share/doc/{name}/changelog.Debian      [...]
| copyright                   | /usr/share/doc/{name}/copyright             [...]
[...]

This will list all file types (Stem column) that debputy knows about and it accounts for any plugin that debputy can find. Note to be deterministic, debputy will not auto-load plugins that have not been explicitly requested during package builds. So this list could list files that are available but not active for your current package.

Note the output is not intended to be machine readable. That may come in later version. Feel free to chime in if you have a concrete use-case.

Take it for a spin

As I started this blog post with, debputy is now available in unstable. I hope you will take it for a spin on some of your simpler packages and provide feedback on it. :)

For documentation, please have a look at:

GETTING-STARTED-WITH-dh-debputy.md (how-to guide)
MANIFEST-FORMAT.md (reference documentation)

Thanks for considering

PS: My deepest respect to the fakeroot maintainers. That game of whack-a-mole is not something I would have been willing to maintain. I think fakeroot is like the Python GIL in the sense that it has been important in getting Debian to where it is today. But at the same time, I feel it is time to let go of the "crutch" and find a proper solution.

Page 1 / 1

Improving Debian packaging in Kate

Getting started

Inlay hint woes

Spurious completion and hover

Hover docs for packages

Making quickfixes available in kate

Making the quickfixes actually work

Remaining limitations with kate

Changes unrelated to kate

Closing

Improving packaging file detection in Debian

The patterns deconstructed

Making detection practical

Diving a bit deeper on getting the stems

The end result

Debian packaging with style black

Beyond black

Differences to wrap-and-sort

Just give me a style

debputy v0.1.21

Package boilerplate reduction with automatic relationship substvar

Language Server (LSP) and Linting

Minimal integration mode for Rules-Requires-Root

Language Server for Debian: Spellchecking

Semantic token support

Spellchecking

Follow up on the "What next?" from my previous update

From here...

Expanding on the Language Server (LSP) support for debian/control

Spellchecking

On wrap-and-sort support

What next?

Language Server (LSP) support for debian/control

Current features

Completion suggestions

Diagnostics

Trying it out

Using the debputy LSP in emacs

From here

Annotating the Debian packaging directory

Adding (non-static) dpkg and debhelper files to the mix

Adding more metadata to the files

Context, context, context

Wrapping it up

Limitations of the approach

Making debputy: Writing declarative parsing logic

From concept to basic implementation

Adding the first parse hint

High level typing

The added side-effect - Introspection

Wrapping it up

Providing online reference documentation for debputy

Automatic discard rules

Plugable manifest rules like the install rule

Manifest variables like {{PACKAGE}}

Thanks

A new Debian package helper: debputy

Trying debputy

Rapid feedback

Immediate manifest validation

Idempotence: Clean re-runs of dh_debputy without clean/rebuild

Packager provided files

Take it for a spin

Tags

Making quickfixes available in `kate`

Remaining limitations with `kate`

Changes unrelated to `kate`

Differences to `wrap-and-sort`

Plugable manifest rules like the `install` rule

Manifest variables like `{{PACKAGE}}`