This paper addresses the problem of object detection by means of the Generalised Hough transform paradigm. The Implicit Shape Model (ISM) is a well-known approach based on this idea. It made this paradigm popular and has been adopted many times. Although the algorithm exhibits robust detection performance, its description, i.e. its probabilistic model, involves arguments which are unsatisfactory from a probabilistic standpoint. We propose a framework which overcomes these problems and gives a sound justification to the voting procedure. Furthermore, our framework allows for a formal understanding of the heuristic of soft-matching commonly used in visual vocabulary systems. We show that it is sufficient to use soft-matching during learning only and to perform fast nearest neighbour matching at recognition time (where speed is of prime importance). Our implementation is based on Gaussian Mixture Models (instead of kernel density estimators as with ISM) which lead to a fast gradient-based object detector.