Cache Format

Cache Format — Specification of the APT2 cache format

Synopsis

#include <apt/cache-internal.h>

typedef             cindex;
typedef             coffset;

#define             CINDEX_NONE
#define             APT_CACHE_MAJOR_VERSION
#define             APT_CACHE_MINOR_VERSION

                    AptCacheHeader;
                    AptCachePackage;
                    AptCacheDependency;
                    AptCacheProvides;
                    AptCacheIndex;
                    AptCacheDescription;
                    AptCacheGroupMember;
enum                AptCacheGroupMemberType;
                    AptCacheGroup;
                    AptCacheTableEntry;

Description

The cache format used by APT2 is very simple. It starts with a header containing meta-information about the cache (AptCacheHeader), and is followed by an array of strings; that is, many strings after each other without particular alignment. If the size of the array is not a multiple of 8, padding NUL bytes will be added.

Next come arrays of fixed-size structures. Those arrays and their members are all aligned at 8 bytes (the first array starts at a multiple of 8 bytes and all members are multiple of 8 bytes). The order of the arrays is the same as the order of the various begin_ members in AptCacheHeader. There is no padding between arrays, the end of one array is directly followed by the first item of the next array.

Design principle 1: Arrays, no linked lists. Using arrays instead of linked lists like APT 0.X does gives us the size of each list, and allows us to create better C++ bindings - that is, standard begin() and end() iterators - than APT 0.X has. The only exception to this rule is the hash table - an openly addressed hash table would not yield as good performance, and there is no real overhead. This introduces overhead at various places when the cache is being created and requires AptCacheGroupMembers.

Design principle 2: Common Alignment. All structure sizes are multiples of 8, allowing to just pack them after each other without having to worry about alignment, and the overhead is not worth talking about.

Design principle 3: Store things only once. Duplicate strings are inserted only once, and AptCacheDescription contains both information about files and description - APT 0.X uses two almost identical objects for this, APT2 only one.

Design principle 4: Don't store unnecessary information. Objects contain no ID fields, nor do array members contain information about their parent object. All this information can be created at run-time, an ID is simply the index of the object in its array, and parent information is known because we need to know the array in order to create the child. This also brings to point 5..

Design principle 5: Two worlds. The API for accessing the cache is completely abstract and could be implemented using a completely different format. In fact, the initial implementation used GVariant for storage.

Design principle 6: Not specific. The cache works with Debian packages, but it also works just as well with RPM and other package formats.

Limits: The cache can store up to 2^32 - 2 objects of each supported type. It supports than 2^32-2 - sizeof(AptCacheHeader) bytes for string storage, and can store up to 65535 dependencies and provides each per package.

Details

cindex

typedef guint32 cindex;

A type definition for indexes into arrays of fixed size structures.


coffset

typedef guint32 coffset;

A type definition for offsets into the cache (in the header) or into the cache's string table (everywhere else).


CINDEX_NONE

#define CINDEX_NONE ((cindex) -1)

Represents an index pointing to a non-existing field. This is the maximum value an unsigned 32-bit integer can hold, thus the maximum valid index is CINDEX_NONE - 1.


APT_CACHE_MAJOR_VERSION

#define APT_CACHE_MAJOR_VERSION ((guint16) 0)

The current cache format major version. Must be increased whenever an incompatible change is made to the cache structures.


APT_CACHE_MINOR_VERSION

#define APT_CACHE_MINOR_VERSION ((guint16) 0)

The current cache format minor version. Must be increased whenever the algorithms used to create the cache data change. For example, when the hash function used in the hash table changes, the minor version must be increased.


AptCacheHeader

typedef struct {
    guint8  magic_bytes[4];
    guint16 major_version;
    guint16 minor_version;
    coffset begin_strings;
    coffset begin_packages;
    guint32 n_packages;
    coffset begin_dependencies;
    guint32 n_dependencies;
    coffset begin_provides;
    guint32 n_provides;
    coffset begin_files;
    guint32 n_files;
    coffset begin_descriptions;
    guint32 n_descriptions;
    coffset begin_group_members;
    guint32 n_group_members;
    coffset begin_groups;
    guint32 n_groups;
    coffset begin_hashtable;
    guint32 n_buckets;
    coffset end_of_file;
} AptCacheHeader;

A small structure containing meta information about the cache.

ABI: Caches with a different major_version and/or different minor_version than the current one are incompatible and must be rebuilt.

guint8 magic_bytes[4];

4 bytes that identify the file to the system (APT\0)

guint16 major_version;

Increased when the structures change

guint16 minor_version;

Increased when the algorithms change

coffset begin_strings;

Offset of the first string in the cache

coffset begin_packages;

Offset of the first package in the cache

guint32 n_packages;

The number of packages in the cache

coffset begin_dependencies;

Offset of the first dependency in the cache

guint32 n_dependencies;

The number of dependencies in the cache

coffset begin_provides;

Offset of the first provide in the cache

guint32 n_provides;

The number of provides in the cache

coffset begin_files;

Offset of the first AptCacheIndex in the cache

guint32 n_files;

The number of files in the cache

coffset begin_descriptions;

Offset of the first AptCacheDescription in the cache

guint32 n_descriptions;

The number of descriptions in the cache

coffset begin_group_members;

Offset of the first member of per-name groups

guint32 n_group_members;

The number of group members in the cache

coffset begin_groups;

Offset of the first per-name group

guint32 n_groups;

The number of groups in the cache

coffset begin_hashtable;

Offset of the first hashtable entry

guint32 n_buckets;

The number of buckets in the hash table

coffset end_of_file;

Offset to the end of the file + 1 byte, that is, the file size

AptCachePackage

typedef struct {
    guint32 hash;
    coffset name;
    coffset version;
    coffset architecture;
    coffset checksum;
    guint32 size;
    guint32 installed_size;
    coffset section;
    cindex begin_dependencies;
    cindex end_dependencies;
    cindex begin_provides;
    cindex end_provides;
    cindex begin_descriptions;
    cindex end_descriptions;
    coffset source_name;
    coffset source_version;
    guint8 state_current;
    guint8 state_selection;
    guint8 flags;
    guint8 multi_arch;
    guint8 priority;
} AptCachePackage;

AptCachePackage represents a single package that may be part of one or more sources. Packages with the same name share a group, see AptCacheGroup for this.

guint32 hash;

A hash that identifies this AptCachePackage, used for grouping duplicate packages together. Completely system-specific.

coffset name;

The name of the package

coffset version;

The version of the package

coffset architecture;

The architecture of the package

coffset checksum;

A SHA256, SHA1, or MD5 checksum of the file, or 0 if unavailable

guint32 size;

The size of the package when packed in bytes

guint32 installed_size;

The size of the unpacked package in multiples of 1 KiB

coffset section;

The section of the package

cindex begin_dependencies;

Index of the first dependency of the package

cindex end_dependencies;

Index of one dependency after the package's last one

cindex begin_provides;

Index of the first provides of the package

cindex end_provides;

Index of one provide after the package's last one

cindex begin_descriptions;

Index of the first description of the package

cindex end_descriptions;

Index of one description after the package's last one

coffset source_name;

The name of the source package, or 0 if equal

coffset source_version;

The version of the source package, or 0 if equal

guint8 state_current;

A value of AptCurrentState

guint8 state_selection;

A value of AptSelectionState

guint8 flags;

A value of AptPackageFlags

guint8 multi_arch;

Values of AptMultiArch

guint8 priority;

The priority of the package

AptCacheDependency

typedef struct {
    coffset target_name;
    coffset target_version;
    coffset target_architecture;
    guint8 comparison;
    guint8 type;
    guint8 next_is_or;
} AptCacheDependency;

AptCacheDependency represents a dependency on a package. The dependency can be versioned and can be ORed; the latter only if type is positive, that is, not a Conflicts or Breaks requirement.

coffset target_name;

The name of the package satisfying this dependency

coffset target_version;

The name of the package satisfying this dependency, or 0

coffset target_architecture;

Undefined field for use with multi-arch dependencies

guint8 comparison;

A value of AptComparisonType, 0 if target_version is 0

guint8 type;

A value of AptDependencyType

guint8 next_is_or;

1 if the next dependency is ORed with this one, 0 otherwise

AptCacheProvides

typedef struct {
    coffset name;
    coffset version;
} AptCacheProvides;

A provides allows a package to declare that it provides a certain feature, also called virtual package. Provides can be versioned and unversioned, versioned dependencies can satisfy versioned dependencies, unversioned provides can only satisfy unversioned dependencies.

coffset name;

The name of the provided feature

coffset version;

The provided version

AptCacheIndex

typedef struct {
    coffset filename;
    coffset archive;
    coffset codename;
    coffset component;
    coffset version;
    coffset origin;
    coffset label;
    coffset site;
    coffset type;
    coffset base_uri;
    guint64 mtime;
    guint8 flags;
} AptCacheIndex;

AptCacheIndex represents a list of packages in a repository, such as the Packages files in Debian repositories.

coffset filename;

The local name of the file, such as /var/lib/dpkg/status

coffset archive;

The release suite, such as 'unstable'

coffset codename;

The codename of the release, such as 'sid'

coffset component;

The component, such as 'main'

coffset version;

The version of the release

coffset origin;

A name set in the repository that identifies its provider

coffset label;

A name set in the repository for this index

coffset site;

The hostname of the source, or 0 for local files

coffset type;

A string identifying the type of package

coffset base_uri;

The base URI which combined with AptCacheDescription.filename gives the URI of a package; or 0, if the file contains no retrievable packages

guint64 mtime;

The time when this file was last modified in seconds since the epoch

guint8 flags;

The flags for this file, as described in AptIndexFlags

AptCacheDescription

typedef struct {
    cindex file_id;
    coffset filename;
    coffset language;
    coffset desc_hash;
    guint32 desc_offset;
} AptCacheDescription;

AptCacheDescription relates packages to indexes. It can contain information about the location of the package archive file; that is, a .deb or similar file; and the location of the description for passing to the system-specific parser.

cindex file_id;

The ID of the AptCacheIndex this belongs to

coffset filename;

Offset of the index name, or 0 if there is none

coffset language;

The language of this description, or 0 for the POSIX locale

coffset desc_hash;

An unique identifier identifying the untranslated description

guint32 desc_offset;

The position of the description in the file

AptCacheGroupMember

typedef struct {
    cindex package_id;
    guint16 type;
    guint16 child;
} AptCacheGroupMember;

An entry in a group of packages. They are used in the hash table to describe locations of packages.

cindex package_id;

ID of the package

guint16 type;

The type, possible values are defined in AptCacheGroupMemberType

guint16 child;

For dependencies and provides, the n-th entry for this package ( e.g. cache->dependencies[package->begin_dependencies + child])

enum AptCacheGroupMemberType

typedef enum _AptCacheGroupMemberType {
    APT_CACHE_GROUP_MEMBER_BINARY,
    APT_CACHE_GROUP_MEMBER_PROVIDES,
    APT_CACHE_GROUP_MEMBER_DEPENDS,
    APT_CACHE_GROUP_MEMBER_SOURCE
} AptCacheGroupMemberType;

Possible values for AptCacheGroupMember.type.

APT_CACHE_GROUP_MEMBER_BINARY

The package has the name of the group

APT_CACHE_GROUP_MEMBER_PROVIDES

The package provides a feature with the group name

APT_CACHE_GROUP_MEMBER_DEPENDS

The package depends on a package with the groups name.

APT_CACHE_GROUP_MEMBER_SOURCE

The package comes from a source package which has the name of the group

AptCacheGroup

typedef struct {
    guint32 hash;
    coffset name;
    cindex begin_members;
    cindex end_members;
} AptCacheGroup;

AptCacheGroup groups packages with the same name, packages providing a feature of this name, and packages depending on a package with such name together. Members in a group are ordered according to the value of AptCacheGroupMember.type.

guint32 hash;

The hash value, calculated using h = 31 * h + c, with h = 0 as the start value and the value for an empty string.

coffset name;

The name of the group

cindex begin_members;

The first member of this group

cindex end_members;

The last member of this group + 1

AptCacheTableEntry

typedef struct {
    cindex group;
    cindex next;
} AptCacheTableEntry;

A single bucket in the hash table. The hash table uses coalesced hashing with the address space of the hash function restricted to the first 86% of the table, and the remaining 14%, called the cellar, are used exclusively for colliding items. Thus, the initial bucket for an item with the hash h in a table n is b = h % (86 * n / 100). This yields very good performance and space utilization. The hash function used is described in the description for AptCacheGroup.hash. The hash table is twice as large as the groups array.

cindex group;

Index of the group in the groups array

cindex next;

Location of next entry