Rafael Laboissière, DD

The DPL 2005
election has been one of the most interesting elections we have
had in Debian these last times. Many facts contributed to
it. First, this election had a strong set of candidates, who
presented interesting platforms. Second, the campaign was done with
lively discussions in the debian-vote mailing
list and a well organized IRC
debate. Finally, the election was surrounded by a quite agitated
context: an overly delayed release schedule for sarge, the
semi-secret

organization of the Vancouver
meeting and the creation of Project Scud.

Beyond the obvious who won

analysis, one may ask which
factors dominated the vote preferences. Answering this question is
possible, in part, thanks to the Condorcet voting
system used in Debian elections, in which the voting options are
numerically ranked by the voters. In this paper, a multivariate
statistical technique is applied to the tally
sheet of votes cast. The data was pre-processed to replace
non-ranked options with numeric values and a Factor
Analysis (FA) was applied. FA is
typically used to unveil the latent structure of a set of variables,
accomplishing it by grouping variables (in our case, the voting
options) together such that a limited number of dimensions can
explain a large amount of the variance in the data set.

Notice that FA is closely related to Principal
Component Analysis (PCA), but FA
results are often more interpretable than those of PCA.
One drawback of FA is that the number of components
that can be extracted is limited to roughly half of the number of
variables. We show below that the three dominating factors in the
DPL 2005 election were a rejection factor

, a Anthony Towns
factor

and a Project Scud factor

(see the Discussion section).

Hereafter, the options appearing in the ballot will be referred by
the initials of the candidates: JW = Jonathan Walther,
MG = Matthew Garret, BR = Branden
Robinson, AT = Anthony Towns, AL = Angus
Lees, and AS = Andreas Schuldei. The None of the
Above

option will be referred as NA. In the
R reports below, the variables are ordered in the way
they appeared in the ballot.

The tally sheet of votes cast was pre-processed with a Perl script to transform the non-ranked option (appearing as - in the ballots) into numeric values. The non-ranked options were replaced by the integer immediately greater than the greater rank. For instance, a ballot like --76--1 is translated into 8876881. Although this particular ballot could also be translated into 4432441, which would have the same effect in the Condorcet system, I preferred to not reorder the ranked option, because this reflects better the voter's intention.

The numeric values where fed to an R script which
generated the text output and figures shown in this paper. Each
voting option is considered as an independent variable in the
analysis. The FA was performed with three factors
because this is the maximum number of factors that can be computed
from seven variables. Factor rotation was chosen to be promax

,
because the non-orthogonal rotation matrix which is obtained allows
for a greater amount of variance explanation.

A preliminary PCA was performed on the data with the following results:

Importance of components: Comp.1 Comp.2 Comp.3 Comp.4 Comp.5 Standard deviation 2.4711358 1.9400016 1.6528040 1.33637108 1.29489891 Proportion of Variance 0.3356938 0.2068970 0.1501733 0.09817575 0.09217684 Cumulative Proportion 0.3356938 0.5425908 0.6927642 0.79093993 0.88311678 Comp.6 Comp.7 Standard deviation 1.05165928 1.01005089 Proportion of Variance 0.06079953 0.05608369 Cumulative Proportion 0.94391631 1.00000000 Loadings: Comp.1 Comp.2 Comp.3 Comp.4 Comp.5 Comp.6 Comp.7 c1 0.673 -0.248 -0.225 0.638 -0.124 c2 0.162 -0.189 0.462 -0.758 -0.161 -0.142 -0.323 c3 0.115 0.697 -0.571 -0.414 c4 0.215 -0.216 -0.721 -0.132 0.105 -0.510 -0.315 c5 0.440 -0.115 -0.202 0.185 -0.226 0.816 c6 0.142 0.604 -0.144 0.740 -0.198 c7 0.498 -0.209 0.444 0.586 -0.302 -0.269 Comp.1 Comp.2 Comp.3 Comp.4 Comp.5 Comp.6 Comp.7 SS loadings 1.000 1.000 1.000 1.000 1.000 1.000 1.000 Proportion Var 0.143 0.143 0.143 0.143 0.143 0.143 0.143 Cumulative Var 0.143 0.286 0.429 0.571 0.714 0.857 1.000

The contribution of each component to the total variance can be visualized in the following figure:

The FA with three factors and promax rotation yields the following results:

Call: factanal(x = vote.df, factors = 3, rotation = "promax") Uniquenesses: c1 c2 c3 c4 c5 c6 c7 0.379 0.838 0.398 0.005 0.462 0.760 0.566 Loadings: Factor1 Factor2 Factor3 c1 0.665 0.194 0.242 c2 0.364 -0.194 -0.177 c3 0.775 c4 0.982 c5 0.718 c6 0.468 c7 0.673 -0.135 -0.101 Factor1 Factor2 Factor3 SS loadings 1.555 1.078 0.928 Proportion Var 0.222 0.154 0.133 Cumulative Var 0.222 0.376 0.509 Test of the hypothesis that 3 factors are sufficient. The chi square statistic is 10.28 on 3 degrees of freedom. The p-value is 0.0164

Graphical representations of the three factors are shown in the following figures, where the factor loadings are plotted as the heights of the bars:

Each ballot can be projected in the space formed by the three factors above. The loadings of the factors constitute the coordinates of the vectors, which form a non-orthogonal coordinate system. The R function qr.solve() was used to back-solve the projections of each ballot onto the three-factors space. The results are shown in a separate file. The quartiles for these projections are depicted in the following boxplot graph:

From the PCA results one can see that a quite high number of components is needed to explain the ballot data. Indeed, the 90% level of variance explanation is only reached at the sixth component. The PCA loadings give us a first indication of how the voting options were grouped together. However, each option tend to have significant loadings in several components and no clear pattern emerges.

The FA results show an interesting combinations of the voting options in each factor. Before going into the interpretation of the factor loadings, we must notice that the FA with three factors is still not statistically sufficient to account for the variation in the data. A p-value of 0.0164 for the chi-square statistics does not allow as to reject the null hypothesis that the three factors are sufficient to describe the data. However, this p-value is not too far from the usual 0.05 threshold and we assume that the factors found did play an important role on the voters decisions.

The three factors could be interpreted as follows:

- Factor#1 – the
rejection factor

: - This factor is the only one which shows a high loading for option NA. Options JW and AL (and, to a lesser extent, option MG) correlates very well with NA in factor#1. The other option have only marginal participation in factor#1. The factor#1 loadings correspond roughly to the performance of each candidate against the NA option (see the beat matrix in the election results). A possible interpretation of this factor is that voters tended to rank candidates JW and AL (and, to a lesser extent, also MG) close to NA. This does not mean that most voters rejected these candidates (many ballots have a negative projection along factor#1). One could say that the rejection of some candidates was the first preoccupation for the majority of the voters.
- Factor#2 – the
Anthony Towns factor

: - This factor has a single high loading for option AT and relatively small loadings for all other options. It may
be interpreted as a tendency to differentiate candidate AT, by ranking it either much higher or much lower than the
others candidates. What made candidate AT so
distinct from the others? We may only speculate here. Anthony
Towns is, by far, the candidate which has been most involved in the
technical infrastructure of Debian (release management of potato
and woody, ftpmaster,
britney

, package pools, crypto in main, among others). It seems that the preference for AT polarized the voters' choices. The underlying question could be whether the Debian developers think about having a highly technical-skilled person as the DPL. Look at theEmpowering leadership

section in Andreas Schuldei's comments about social groups for some discussion along this line. - In a personal communication, Steve Greenland
suggested that AT's technical skills were irrelevant
and that the reaction for or against AT was largely
based on these two components:
- AT had proposed temporarily limiting access to
the mailing lists for people who violated standards of
conduct. Steve suspects that this produced a strong reaction of
either
it's about time

orcompletely out of the question

. - As noted above, AT is active at the infrastructure level of Debian, which gives him a lot of de-facto power over the project. Steve would guess that some people did not think that combining this with the office of DPL was a good idea.

- AT had proposed temporarily limiting access to
the mailing lists for people who violated standards of
conduct. Steve suspects that this produced a strong reaction of
either
- Factor#3 – the
Project Scud factor

: - This factors clearly puts the option MG against the options BR and AS. Several aspects of the campaigning could explain this opposition, but the most obvious one is the Project Scud, of which Branden Robinson and Andreas Schuldei are members. Matthew Garret was the candidate that most clearly expressed disagreement with the need for a DPL team. If this interpretation of factor#3 is correct, we need to explain way option JW has a relatively high loading in factor#3. One could argue that Jonathan Walther did not publicly disagree with Project Scud or that he expressed clearly in his platform that he would work with teams if elected.

As a final analysis, each ballot was classed according to how much it scores along each of the three factors (the results are in a separate file). For doing this, the interval of variation of each factor was subdivided according to the following quantiles:

Quantiles for factor #1 projections: 0% 25% 45% 55% 75% 100% -5.0557196 -1.1328821 0.1789334 0.6457197 1.5087747 4.6622318 Quantiles for factor #2 projections: 0% 25% 45% 55% 75% 100% -3.3395812 -1.3370558 -0.5280313 -0.1624911 0.9755918 4.9479631 Quantiles for factor #3 projections: 0% 25% 45% 55% 75% 100% -3.3321167 -1.5341804 -0.6769201 -0.1627885 1.0181405 6.6138725

Using the limits above and the projection data, each ballot was classed along the factors using one of the symbols: --, -, o, +, and ++. For instance, the two ballots below:

ballot REJ AT PS 7314526 ++ -- ++ 1-23--- -- o +

could be interpreted as:

- 7314526: Strongly rejects some candidates; does not put AT above the others; strongly supports Project Scud.
- 1-23---: Does not reject the
most rejected

candidates; has a neutral position as regards AT; moderately supports Project Scud.

One might also question whether it is legitimate to use the rank order in the ballots as numerical values for the FA. In a private communication, Chris Lawrence argued that it may be better to use a scaling technique, like Unfolding, which would convert each ballot to a set of distances between the voter's ideal political position and the candidates' ones. Using the distance matrix it would be then possible to find the position of each voter and each candidate in a low-dimensional policy space. Open questions with this approach are how to treat non-ranked option and ties, and whether it is better to use a metric or non-metric Unfolding technique.

Rafael Laboissière ⟨rafael AT debian DOT org⟩

DISCLAIMER: Although its format may suggest it, this article should not be considered as a fully scientific work. I have written it mostly for the fun of doing it. The interpretations are obviously subjective and I apologize for offenses that the candidates may take from this text. Comments and suggestions for improvements are welcome.

The results and the figures in this article were obtained using scripts written in Perl and R. The source code, including the HTML source for this web page, is available as a tar.gz file (25K).