 # Math Translation?

The following PostScript document seems to detail a hashing function that takes a point in multidimensional space and, if I’m reading it right, return a single scalar value whereby points in close proximity will hash to values near each other and points which are far from each other hash to values far from each other.

http://www-math.mit.edu/~vempala/papers/hashing.ps (See here for a PostScript viewer, if you need it.)

Assuming my description to be accurate, I’d really like to be able to use this in a project, but I’m getting lost in the mathematical notation and descriptions. Could anyone who can follow the notation possibly translate this into two functions like:

float hash2D(float x, float y) {

}

float hash3D(float x, float y, float z) {

}

That is of course assuming that the document actually states the methodology well enough to be able to do so.

Assuming there is, do value ranges need to be defined or are all points assumed to exist between -1 and 1 or something?

Slightly hopeful bump.

Even if it is too much work to effectively translate the paper into usable code to do as a freebie, I’d still be interested in knowing whether it contains sufficient information to actually write such a formula, so that I know whether to pursue it further.

I’ve only skimmed the paper, but I think it does claim existence of functions like you’re looking for (though you probably want the return value to be an integer type of some sort, not a float). It relies on the existence of one-dimensional locality-preserving hash functions: functions g for which if |x-y| is small then so is |g(x)-g(y)|. These are apparently discussed in Linial&Sasson  but not exhibited in this paper (only described briefly following Definition 1), so if you want to implement such a function you’d have to see if they give useful examples there. (These functions likely do assume a finite domain, so really you want to quantize your floats to ints with some precision before continuing.)

I’m not sure where you’re lost, but the basic idea, as given in section 3 (if I understand it correctly on a quick reading), seems pretty straightforward. If you have locality-preserving one-dimensional hashes g[sub]x[/sub] and g[sub]y[/sub], and you take a random linear combination h=a g[sub]x/sub + b g[sub]y/sub (with a+b=1), then the result is clearly a locality-preserving two-dimensional hash. Of course you expect more collisions in higher dimensions, since you’re projecting out a lot of potential information, so some choices of hashes and linear combinations will perform poorly on some data sets.

Here is the Linial and Sasson paper:

http://www.cs.huji.ac.il/~nati/PAPERS/nonexp_hash.pdf

I’m looking through it now, but it seems to be similar in proposing theories and then proving them, but not really just simply saying, “This is one such formula. Take it and prosper!” And I pretty much know no math notation beyond addition, subtraction, multiplication, division, square root, and exponents so…yeah.

Even if you know the notation, implementing an algorithm from either paper you’ve linked to will be highly non-trivial. You might try emailing the authors and seeing if they have code, or if they can point you in the right direction.