Project: Geometric Methods in Learning and Statistics

People: Guy Lebanon

Description: Statistical modeling of data assumes either implicitly or explicitly a geometric structure on the data space. The assumed geometric structure is often used with no supporting evidence from the data. In particular, the often used Euclidean geometry is not necessarily a good choice for complex data such as text documents, images etc. In this project we examine the geometric assumptions made by statistical models and reformulate them in a general case. A recurring application area is text documents where a natural geometric candidate is the Fisher geometry. Supported by Cencov's and numerous experimental results, it often leads to a significant improvement in modeling of documents.

Publications:

A geometric interpretation of logistic regression leads to powerful generalization to alternative geometries. A geometric interpretation of logistic
regression leads to powerful generalization
to alternative geometries.



Decision boundaries for support vector machines using the Euclidean heat kernel Decision boundaries for support vector machines using the Fisher-geometry heat kernel
Decision boundaries for support vector machines using the Euclidean heat kernel (left) and the Fisher-geometry heat kernel (right)


Learning a geometry for documents expressed through the action of Lie groups on the simplex helps to improve document classification and explain popular web search techniques. Learning a geometry for documents expressed through the action of Lie groups on the simplex helps to improve document classification and explain popular web search techniques.
Learning a geometry for documents expressed through the action of Lie groups on the simplex helps to improve document classification and explain popular web search techniques.