We consider algorithms for learning functions $f: X rightarrow Y$, where $X$ and $Y$ are finite, and there is assumed to be no noise in the data. Learning algorithms, , are connected with the set of prior probability distributions for which they are optimal. A method for constructing from is given and the relationship between the various is discussed. em Improper algorithms are identified as those for which has zero volume. Improper algorithms are investigated using linear algebra and two examples of improper algorithms are given. This framework is then applied to the question of choosing between competing algorithms. ``Leave-one-out'' cross-validation is hence characterised as a crude method of ML-II prior selection. We conclude by examining how the mathematical results bear on practical problems and by discussing related work, as well as suggesting future work.
|Title of host publication||Machine Learning: Proceedings of the Twelfth International Conference (ML95)|
|Editors||Armand Prieditis, Stuart Russell|
|Place of Publication||San Francisco, CA|
|Publisher||Morgan Kaufmann Publishers|
|Number of pages||8|
|Publication status||Published - 1995|