Basic idea:
\large \cos\theta=\frac{|\vec{a} \cdot \vec{b}|}{|\vec{a}||\vec{b}|}=\vec{x}^T\vec{y}
\vec{x},\vec{y}: unit vectors of \vec{a} and \vec{b} respectively
\theta: angle between \vec{x} and \vec{y}.
database matrix: \large \vec{x}=\frac{\vec{a}}{|\vec{a}|}
search vector: \large \vec{y}=\frac{\vec{b}}{|\vec{b}|}
If \cos\theta = 0, \theta=90^\circ, the document does not contain any of the search words and the corresponding column vector of the database matrix is orthogonal to the search vector.
If \cos\theta is close to 1, \theta \sim 0, the data corresponding to that vector best matches our search criteria.
More to explore:
Latent Semantic Indexing (LSI)
Singular value decomposition
Covariance
Least squares problem
No comments:
Post a Comment