I am digressing a little in this post. One of the things I want to get
out of this exercise is to learn more about Ontologies and Ontology
editors, and on the principle that you can never learn something
unless you build something with it (aka bone knowledge), so this is
gathering my thoughts to get started on creating an Ontology for
package building. Perhaps this has been done before, and better, but
I'll probably learn more trying to create my own.
Also, I am playing around with code, an odd melange of my package
building porcelain, and
gitpkg, and other ideas bruited on
and I don't want to blog about something that would be embarrassing in
the long run if some of the concepts I have milling around turn out to
not meet the challenge of first contact with reality.
I want to create a ontology related to packaging software. It should
be general enough to cater to the needs any packaging effort in a
distribution agnostic and version control agnostic manner. It should
enable us to talk about packaging schemes and mechanisms, compare
different methods, and perhaps to work towards a common interchange
mechanism good enough for people to share the efforts spent in
The ontology should be able to describe common practices in packaging,
concepts of upstream sources, versioning, commits, package versions,
and other meta-data related to packages.
I am doing this ontology primarily for myself, but I hope this might
be useful for other folks involved in packaging software.
So, here follow a set of concepts related to packaging software:
software is a general term used to describe a collection of
computer programs, procedures and documentation that perform some
tasks on a computer system.
software is what we are trying to package
software has names
software has associated with it source code
source code is any collection of statements or declarations
written in some human-readable computer programming language.
source code is usually held in one or more text files (blobs).
A large collection of source code files may be organized into a
directory tree, in which case it may also be known as a source
The source code may be converted into an executable format by a
compiler, or executed on the fly from the human readable form with
the aid of an interpreter.
executable format is the form software must be in in order to be
run. Running means to cause a computer "to perform indicated tasks
according to encoded instructions."
software source code has one or more lines of development. Some
Common specific lines of development for the software to be
upstream line of development
feature branch is a line of development related to a new
feature under development. Often the goal is to merge the
feature branches into the upstream line of development
usually, all feature branches are merged into the integration branch, and the package is created from the integration branch.
integration branch is the line of development of software that is
to be packaged
some software lines of development have releases
releases have release dates
some releases have release versions
source code may be stored in a version control repository, and
Trees are a collection of blobs and other trees (directories
and sub-directories). A tree object describes the state of a
directory hierarchy at a particular given time.
Blobs are simply chunks of binary data - they are the contents of
a tree can be converted into an archive and back
In git, directories are represented by tree object. They refer to
blobs that have the contents of files (file name, access mode, etc is
all stored in the tree), and to other trees for sub-directories.
Commits (or "changesets") mark points in the history of a line of development, and references to parent commits.
A commit refers to a tree that represents the state of the files at
the time of the commit.
A working directory is a directory that corresponds, but might not
be identical, to a commit in the version control repository
Commits from the version control system can be checked out into the
uncommitted changes are changes in the working directory that make
it different from the corresponding commit. Some call the working
directory to be in a "dirty" state.
uncommited changes be checked in into the version control
system, creating a new commit
The working directory may contain a ignore file
ignore file contains the names of files in the working directory that should be "ignored" by the version control system.
In git, a commit may also contains references to parent commits.
If there is more than one parent commit, then the commit is a
If there are no parent commits, it is an initial commit
references, or heads, or branches, are movable references to a
commit. On a fresh commit, the head or branch reference is
moved to the new commit.
lines of development are usually stored as a branch in the version
a patch is a file that contains difference listings between two
A patch file can be used to transform (patch) one tree into
A quilt series is a method of representing an integration branch
as a collection of a series of patches. These patches can be
applied in sequence to the upstream branch to produce the
A tag is a named reference to a specific commit, and is not normally
moved to point to a different commit.
A package is an archive format of software created to be
installed by a package management system or a self-sufficient
installer, derived by transforming a tree associated with an
packages have package names
package names are related to upstream software names
packages have package versions
package versions may have
an upstream version component
a distribution or packaging specific component
package versions are related to upstream software versions
helper packages provide libraries and other support facilities to
help compile an integration branch ultimately yielding a package