An Extension of the Gap Statistic for Ordinal/Categorical Data


[Up] [Top]

Documentation for package ‘DiscreteGapStatistic’ version 0.1.0

Help Pages

BhattacharyyaDist Bhattacharyya distance core function
ChisqDist Chi-square distance core function
clusGapDiscr Discrete application of clusGap Based on the implementation of the function found in the 'cluster' R package
concussion Concussion Data
cramersVmod Cramer's V modified pairwise vector function based on the function found in lsr package This is simple wrapper of the usual chisq.test fun This is actually an adjusted version of the pi = sqrt(Chisq2/N) guaranteeing that values are within 0 (no association) and 1 (association)
CramerV Cramer's V core function
dissbhattacharyya Bhattacharyya's wrapper Function
disschisquare Chi-square distance wrapper function
disscramerv Cramer's V distance wrapper function
disshamming Hamming distance wrapper function Function based on cultevo's package implementation
disshellinger Hellinger's distance wrapper Function
distanceHeat sample-to-sample heatmap clustering samples according to a given categorical distance Exploratory tool that helps to visualize/cluster blocks of observations across columns ordered according to given categorical distance. The final output is a clustered distance matrix. This plot is aimed to guide the 'DiscreteClusGap' user to give an idea which type of categorical distance would accommodate better to the inputted data. 'sample2sampleHeat' is based on the 'pheatmap' function from the 'pheatmap' R package. Thus, any parameter found in pheatmap can be specified to 'sample2sampleHeat'.
distancematrix Function invoking discrete distance functions
findK Criteria to determine number of clusters k
HellingerDist Hellinger distance core function
likert.heat.plot2 Summary Heatmap for categorical/Likert data Heatmap representation summarizing categorical/likert data. Modified version of 'likert.heat.plot' from 'likert' package. Does not allow different categorical ranges across questions. The function outputs a ggplot object where additional layers can be added for customization purposes. The output plot preserves the question order given by columns of 'x'.
mass mass data
ResHeatmap Heatmap assuming a given a distance function and a known number of clusters. Function to display a categorical data matrix given a user defined number of clusters 'nCl', a categorical distance 'distName' and a predefined clustering method 'FUNcluster'. The output displays a heatmap separating and color-labelling resulting clusters vertically in the rows and allowing unsupervised clustering on questions in the columns. Each cell is colored according to the categorical values provided or found in the data. The clustergram is based on the 'pheatmap' function from the pheatmap R package. Thus, any parameter found in pheatmap can be specified to 'clusGapDiscrHeat'. This function can be used to examine number of clusters before running 'clusGapDiscrHeat' but also after number of clusters is determined.
SimData Simulate Data