Calculates the cosine similarity and angular similarity on two vectors or a matrix

csi(x, y = NULL)

Arguments

x

A vector or matrix object

y

If x is a vector, then a vector object

Value

If x is a matrix, a list object with: similarity and angular.similarity matrices or, if x and y are vectors, a vector of similarity and angular.similarity

Note

The cosine similarity index is a measure of similarity between two vectors of an inner product space. This index is bested suited for high-dimensional positive variable space. One useful application of the index is to measure separability of clusters derived from algorithmic approaches (e.g., k-means). It is a good common practice to center the data before calculating the index. It should be noted that the cosine similarity index is mathematically, and often numerically, equivalent to the Pearson's correlation coefficient

The cosine similarity index is derived: s(xy) = x * y / ||x|| * ||y||, where the expected is 1.0 (perfect similarity) to -1.0 (perfect dissimilarity). A normalized angle between the vectors can be used as a bounded similarity function within [0,1] angular similarity = 1 - (cos(s)^-1/pi)

Author

Jeffrey S. Evans <jeffrey_evans@tnc.org>

Examples

# Compare two vectors (centered using scale) x=runif(100) y=runif(100)^2 csi(as.vector(scale(x)),as.vector(scale(y)))
#> similarity angular.similarity #> -0.01382014 0.68165971
#' # Compare columns (vectors) in a matrix (centered using scale) x <- matrix(round(runif(100),0),nrow=20,ncol=5) ( s <- csi(scale(x)) )
#> $similarity #> [,1] [,2] [,3] [,4] [,5] #> [1,] 1.00000000 -0.50442963 -0.06579517 0.2182179 0.06579517 #> [2,] -0.50442963 1.00000000 -0.01010101 -0.3015113 0.01010101 #> [3,] -0.06579517 -0.01010101 1.00000000 0.1005038 0.21212121 #> [4,] 0.21821789 -0.30151134 0.10050378 1.0000000 0.10050378 #> [5,] 0.06579517 0.01010101 0.21212121 0.1005038 1.00000000 #> #> $angular.similarity #> [,1] [,2] [,3] [,4] [,5] #> [1,] 1.0000000 0.6364044 0.6809999 0.6739580 0.6809999 #> [2,] 0.6364044 1.0000000 0.6816739 0.6666524 0.6816739 #> [3,] 0.6809999 0.6816739 1.0000000 0.6800757 0.6743921 #> [4,] 0.6739580 0.6666524 0.6800757 1.0000000 0.6800757 #> [5,] 0.6809999 0.6816739 0.6743921 0.6800757 1.0000000 #>
# Compare vector (x) to each column in a matrix (y) y <- matrix(round(runif(500),3),nrow=100,ncol=5) x=runif(100) csi(as.vector(scale(x)),scale(y))
#> [,1] [,2] [,3] [,4] [,5] #> similarity 0.05044759 -0.000619605 0.1838755 -0.02890784 0.02171927 #> angular.similarity 0.68128464 0.681690053 0.6762322 0.68155707 0.68161502