mismatches between mathematical essences and data structures

Oct 13, 2009 20:15

Lately, I've been faced with problems of the following type:
* there are many ways to represent an object
* a canonical representation is either hard to define, or unnatural to work with

(1) optimizing over the space of k-dim linear subspaces (a.k.a. k-flats). They can be represented as bases (more canonically, orthogonal bases), but that's not enough to identify them uniquely. We can define k-flats to be points in a Grassmanian, which is the appropriate quotient space (in which bases of the same space form an equivalence class). But optimizing here involves doing calculus on surfaces, which requires knowledge of differential geometry... or easier, one could use Lagrange multipliers, but I don't have enough constraints (I don't know how to constrain the rotation of the orthogonal basis).

(2) suppose you have an unknown probability distribution over clusterings of nodes. You can easily assign a label to each node such that two nodes are in the same cluster if they have the same label. Of course, these labels are just names, and permuting them results in an equivalent structure, which necessarily has the same probability. Now, suppose you get several samples from this distribution (say, by MCMC), and you want to estimate the distribution. Now, for each sample, you could simply add the other n!-1 relabelings ("mirror images"), but can you do better?
This is known as the "label-switching problem", and in some cases it can addressed by "artificial identifiability constraints" (e.g. in the form of: WOLOG, node 1 is always labeled with "a"), but this too can become unmanageable and unnatural.

Tangentially, is there a nice way to represent graphs modulo isomorphism? I'm aware of spectral techniques (e.g. graph Laplacian), but these are usually lossy, i.e. querying whether two graphs are the "same" (i.e. isomorphic) will result in some false positives.

computer_science

Previous post Next post
Up