org.openjump.core.attributeoperations.statistics
Class CorrelationCoefficients

java.lang.Object
  extended by org.openjump.core.attributeoperations.statistics.CorrelationCoefficients

public class CorrelationCoefficients
extends java.lang.Object

Class that calculates various correlation coefficients.

Version:
$Rev: 2509 $ modified: [sstein]: 16.Feb.2009 changed logger-entries to comments
Author:
Ole Rahn

FH Osnabrück - University of Applied Sciences Osnabrück,
Project: PIROL (2005),
Subproject: Daten- und Wissensmanagement

Nested Class Summary
 class CorrelationCoefficients.CorrelationInformation
           
 class CorrelationCoefficients.RankCorrelationInformation
           
protected  class CorrelationCoefficients.SpearmanRankNumberPair
           
 
Field Summary
protected  java.lang.String attrName1
           
protected  java.lang.String attrName2
           
protected  java.lang.Object[] dataArray
           
protected  double[] means
           
protected  Feature[] rawFeatures
           
 
Constructor Summary
CorrelationCoefficients(Feature[] features, java.lang.String attr1, java.lang.String attr2)
           
 
Method Summary
protected  double aritmeticMiddle(Feature[] features, int attr)
           
static double getDeviation(Feature[] features, java.lang.String attr, double mean)
          Returns the deviation of the values of the given attribute.
 CorrelationCoefficients.RankCorrelationInformation getKendalsTauRankCoefficient()
          "Spearman Rank Order Correlations (or "rho") and Kendall's Tau-b (or "tau") Correlations are used when the variables are measured as ranks (from highest-to-lowest or lowest-to-highest)"
http://www.themeasurementgroup.com/datamining/definitions/correlation.htm
 double getMean(int nr)
          Get the aritmetic middle for the nr-th attribut given
 CorrelationCoefficients.CorrelationInformation getPearsonCoefficient()
          get Pearson's correlation coefficient (good, dimension-less measure, if there is a linear relation between the attributes)
see: http://www.netzwelt.de/lexikon/Korrelationskoeffizient.html
protected  java.util.HashMap<java.lang.Integer,java.lang.Double> getRank2SpearmanRankMap(java.lang.Object[] sortedValues, java.util.HashMap<java.lang.Object,java.lang.Integer> value2NumAppearances)
           
 CorrelationCoefficients.RankCorrelationInformation getSpearmansRhoCoefficient()
          get Pearson's correlation coefficient (good, dimension-less measure, if there is a linear relation between the attributes)
see: http://www.netzwelt.de/lexikon/Korrelationskoeffizient.html
protected  double getVariance(java.lang.String attr)
           
protected  CorrelationDataPair[] initializeDataStorage(Feature[] features)
           
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

dataArray

protected java.lang.Object[] dataArray

attrName1

protected java.lang.String attrName1

attrName2

protected java.lang.String attrName2

means

protected double[] means

rawFeatures

protected Feature[] rawFeatures
Constructor Detail

CorrelationCoefficients

public CorrelationCoefficients(Feature[] features,
                               java.lang.String attr1,
                               java.lang.String attr2)
Method Detail

initializeDataStorage

protected CorrelationDataPair[] initializeDataStorage(Feature[] features)

getDeviation

public static double getDeviation(Feature[] features,
                                  java.lang.String attr,
                                  double mean)
Returns the deviation of the values of the given attribute. Uses a given mean to avoid multiple calculation of the mean. To get the mean take a look at the FeatureCollectionTools class. This class is also used by aritmeticMiddle().

Parameters:
features - array containing the features we want the deviation for
attr - name of the attribute to calculate the deviation for
mean - the mean for the given features
Returns:
the deviation
Throws:
java.lang.IllegalArgumentException - if the attribute is not of a numerical type
See Also:
FeatureCollectionTools

getVariance

protected double getVariance(java.lang.String attr)

aritmeticMiddle

protected double aritmeticMiddle(Feature[] features,
                                 int attr)

getMean

public double getMean(int nr)
Get the aritmetic middle for the nr-th attribut given

Parameters:
nr - index number of attribut to calculate the mean for
Returns:
the mean for the attribute or Double.NaN, if errors occured

getPearsonCoefficient

public CorrelationCoefficients.CorrelationInformation getPearsonCoefficient()
get Pearson's correlation coefficient (good, dimension-less measure, if there is a linear relation between the attributes)
see: http://www.netzwelt.de/lexikon/Korrelationskoeffizient.html

Returns:
Pearson's correlation coefficient

getRank2SpearmanRankMap

protected java.util.HashMap<java.lang.Integer,java.lang.Double> getRank2SpearmanRankMap(java.lang.Object[] sortedValues,
                                                                                        java.util.HashMap<java.lang.Object,java.lang.Integer> value2NumAppearances)

getSpearmansRhoCoefficient

public CorrelationCoefficients.RankCorrelationInformation getSpearmansRhoCoefficient()
get Pearson's correlation coefficient (good, dimension-less measure, if there is a linear relation between the attributes)
see: http://www.netzwelt.de/lexikon/Korrelationskoeffizient.html

Returns:
Pearson's correlation coefficient

getKendalsTauRankCoefficient

public CorrelationCoefficients.RankCorrelationInformation getKendalsTauRankCoefficient()
"Spearman Rank Order Correlations (or "rho") and Kendall's Tau-b (or "tau") Correlations are used when the variables are measured as ranks (from highest-to-lowest or lowest-to-highest)"
http://www.themeasurementgroup.com/datamining/definitions/correlation.htm

Returns:
RankCorrelationInformation