The number of the geek is 0x029A
Dec. 17th, 2007 05:24 pm![[personal profile]](https://www.dreamwidth.org/img/silk/identity/user.png)
Hi folks,
I was chatting to
foxypinkninja the other day, expounding my usual rant about the inadequacy of English when defining imperfectly understood concepts related particularly to gender and sexuality. It occured to me that there might be a more effective, admittedly geekier, solution that could put an end to all world strife. Well, probably not, but it would be fun, anyway. Chatting a bit more to
doseybat today made me decide to post about the idea before brain fade did for it and consigned it to the maybe-perhaps-when pile.
The basic thesis is pretty simple: we all are usually defined by large characteristics, such as male, female, tall, short, sporty, couchpotatoish, gay, straight, bi, lesbian, mono, poly, polyfi, swinger, BDSM, vanilla, trans, cisgendered, etc. These large characteristics are all inherently broad-brush definitions that are easily broken by specific counterexamples, and in any case they have definitions that are difficult to agree upon. My argument is that, in reality, we are defined by a potentially very large number of small characteristics that collectively (and often imperfectly) entail the large characteristics. By finding a large (though hopefully reasonably minimal) set of small characteristics that individually are easily agreed upon, it then becomes possible to define something like a sociological genome for people, potentially allowing the aforementioned large characteristics to be discarded.
By definition, a characteristic is small if and only if it doesn't subdivide into component characteristics (e.g. male is not a small characteristic because it can be subdivided into (at the very least) physical primary and secondary sexual characteristics, which may or may not match the person's gender orientation).
If it's possible to come up with a set of such characteristics that are necessary (in the sense that removing them would reduce specificity) and sufficient (in the sense that adding more characteristics wouldn't do much to improve specificity), we can encode that as a bit string that can then be represented conveniently. I had a couple of ideas for this. One, obvious, way is to encode it as a hex number, because this is reasonably compact, ASCII compatible and relatively easy to cut and paste or to type manually. Given your 'geek ID', it should then be relatively simple to build gadgets like PDA/cellphone/web applications that can encode and decode these things, and (interestingly) compare them with those of people near you, possibly automatically.
OK, here's the second (and possibly geekier) part of the idea. This came out of figuring out how it might be possible to derive a reasonably compact set of characteristics -- we don't really want to have the okcupid.com umpty gazillion not-very-relevant questions here. It occurred to me that it is probably possible to have a bunch of people take an online test deliberately with way more questions than are strictly necessary, then feed all their results into some of the kinds of algorithm used for pulling trees from DNA sequences. I originally thought of tools like MrBayes, but doseybat recommends that I probably find out about Principal Components Analysis. Once you get a tree, it's probably fairly straightforward to do an analysis on the results to figure out which characteristics are predictive and which can be discarded as noise, so from that it should be possible to find a reduced set that, when analysed, generate the same (or at least a very similar) tree.
Anyway, there are more things that you can do with a tree like that. For a start, you can use it to classify new data, so (assuming the original tree is detailed enough) you don't need to do PCA every time you have a new person come along and want to know where they end up in the tree. The second (and IMO very cool) thing you can do is, once you've found somewhere in the tree to place someone, you can encode their position in the tree by starting at the root and making a note of which branches lead to the relevant leaf. Such encodings are likely to be very compact and efficient, certainly O(log N) in size with respect to the number of leaves in the tree, but (cooler still) if you encode the position in the tree most-significant-bit first as you go from the root to the leaf, you end up with a number with some interesting properties: given any two data points on the tree in close proximity to each other, their encodings are also guaranteed to be close to each other. A simple binary enoding running MSB to LSB isn't ideal, though, because you can still end up with adjacent codes that actually reflect the top and lower bounds of not-really-very-connected main branches. A better encoding would space the allocation of codes such that, if you take the tree and encode it, then do cluster analysis on the codes you get the tree back again. This would be incredibly cool -- we could then have a single number, say in the range 0-9999 decimal, which both accurately places us in the tree, and also makes it possible to compare ourselves with other people just by taking the difference in the code. If I'm a 1378 and you're a 1392, we're probably going to be very similar, but if you're a 9326, that's unlikely. The actual ordering isn't necessarily meaningful, of course, but differences actually are. I am tempted to cheekily flip the tree around a bit so that the numbers form some kind of conservative -- liberal continuum, but it might be better to avoid that for political reasons. Or not. I'd certainly like it if it worked like that, though I suspect that the fattest branches will probably correspond most closely to the familiar large characteristics, though it is significantly possible that they may actually not do so.
It's also possible to use this a bit like a dating site by answering the questions as if you were someone you'd want to date. Even cooler, you could also answer it as if you were someone you'd never date in a million years. We could then have a pin badge with three numbers on it: a green one, specifying who we'd like to meet, a black one specifying ourself, and a red one specifying, if you're close to this number, just don't even ask!
I was chatting to
![[livejournal.com profile]](https://www.dreamwidth.org/img/external/lj-userinfo.gif)
![[livejournal.com profile]](https://www.dreamwidth.org/img/external/lj-userinfo.gif)
The basic thesis is pretty simple: we all are usually defined by large characteristics, such as male, female, tall, short, sporty, couchpotatoish, gay, straight, bi, lesbian, mono, poly, polyfi, swinger, BDSM, vanilla, trans, cisgendered, etc. These large characteristics are all inherently broad-brush definitions that are easily broken by specific counterexamples, and in any case they have definitions that are difficult to agree upon. My argument is that, in reality, we are defined by a potentially very large number of small characteristics that collectively (and often imperfectly) entail the large characteristics. By finding a large (though hopefully reasonably minimal) set of small characteristics that individually are easily agreed upon, it then becomes possible to define something like a sociological genome for people, potentially allowing the aforementioned large characteristics to be discarded.
By definition, a characteristic is small if and only if it doesn't subdivide into component characteristics (e.g. male is not a small characteristic because it can be subdivided into (at the very least) physical primary and secondary sexual characteristics, which may or may not match the person's gender orientation).
If it's possible to come up with a set of such characteristics that are necessary (in the sense that removing them would reduce specificity) and sufficient (in the sense that adding more characteristics wouldn't do much to improve specificity), we can encode that as a bit string that can then be represented conveniently. I had a couple of ideas for this. One, obvious, way is to encode it as a hex number, because this is reasonably compact, ASCII compatible and relatively easy to cut and paste or to type manually. Given your 'geek ID', it should then be relatively simple to build gadgets like PDA/cellphone/web applications that can encode and decode these things, and (interestingly) compare them with those of people near you, possibly automatically.
OK, here's the second (and possibly geekier) part of the idea. This came out of figuring out how it might be possible to derive a reasonably compact set of characteristics -- we don't really want to have the okcupid.com umpty gazillion not-very-relevant questions here. It occurred to me that it is probably possible to have a bunch of people take an online test deliberately with way more questions than are strictly necessary, then feed all their results into some of the kinds of algorithm used for pulling trees from DNA sequences. I originally thought of tools like MrBayes, but doseybat recommends that I probably find out about Principal Components Analysis. Once you get a tree, it's probably fairly straightforward to do an analysis on the results to figure out which characteristics are predictive and which can be discarded as noise, so from that it should be possible to find a reduced set that, when analysed, generate the same (or at least a very similar) tree.
Anyway, there are more things that you can do with a tree like that. For a start, you can use it to classify new data, so (assuming the original tree is detailed enough) you don't need to do PCA every time you have a new person come along and want to know where they end up in the tree. The second (and IMO very cool) thing you can do is, once you've found somewhere in the tree to place someone, you can encode their position in the tree by starting at the root and making a note of which branches lead to the relevant leaf. Such encodings are likely to be very compact and efficient, certainly O(log N) in size with respect to the number of leaves in the tree, but (cooler still) if you encode the position in the tree most-significant-bit first as you go from the root to the leaf, you end up with a number with some interesting properties: given any two data points on the tree in close proximity to each other, their encodings are also guaranteed to be close to each other. A simple binary enoding running MSB to LSB isn't ideal, though, because you can still end up with adjacent codes that actually reflect the top and lower bounds of not-really-very-connected main branches. A better encoding would space the allocation of codes such that, if you take the tree and encode it, then do cluster analysis on the codes you get the tree back again. This would be incredibly cool -- we could then have a single number, say in the range 0-9999 decimal, which both accurately places us in the tree, and also makes it possible to compare ourselves with other people just by taking the difference in the code. If I'm a 1378 and you're a 1392, we're probably going to be very similar, but if you're a 9326, that's unlikely. The actual ordering isn't necessarily meaningful, of course, but differences actually are. I am tempted to cheekily flip the tree around a bit so that the numbers form some kind of conservative -- liberal continuum, but it might be better to avoid that for political reasons. Or not. I'd certainly like it if it worked like that, though I suspect that the fattest branches will probably correspond most closely to the familiar large characteristics, though it is significantly possible that they may actually not do so.
It's also possible to use this a bit like a dating site by answering the questions as if you were someone you'd want to date. Even cooler, you could also answer it as if you were someone you'd never date in a million years. We could then have a pin badge with three numbers on it: a green one, specifying who we'd like to meet, a black one specifying ourself, and a red one specifying, if you're close to this number, just don't even ask!
So, cool, or a step too far?
(no subject)
Date: 2007-12-18 03:55 pm (UTC)Secondly, I think people actually like the ambiguity. A few years ago I developed a "gender plane" concept which, I hoped, would rid us of all the damn lists. You have a plane whose axes, from zero to one, are labelled "male" and "female". You then pick a point on the plane. Primary and secondary physical and psychological characteristics have a plane each (four in total). Sexual preference can fit on a plane or set of planes too. The great thing is this allows for possibilities like asexuality, people who don't feel strongly gendered, and people who feel strong pulls to both genders, as well as people who possess complex physical characteristics, roughly 0.1% of the population.
I suggested this idea gleefully to the many complex people I know, where it was met with suspicion, bafflement and slight (friendly) hostility.
I think this is itself a complex phenomenon. Easy classification is resented because it can lead to stereotypes. I think there may also a slightly tribal tendency to want to exclude "normals" from understanding too easily (rather like regional slangs designed to separate insiders from outsiders).
I have also had people argue that, for example, a planar system doesn't allow for enough complexity. These people then try and lump all sorts of unrelated things into gender and sexuality. It isn't, of course, a complete personality template, but it does bring me to what I think is the last reason for resenting a classification system.
Girls Like Lists. Most women's magazines have articles like "242 style must-haves" or "376 shoes you must eat this summer" or such. The number is always a comfortably weird one, clearly not combinatorial or a power of two. I know people who prefer a system with three classifications of sexual preference ("male", "female" and "both") to my planar system which includes "null" as well. I know of at least one real-life near-asexual person. It's sexual stereotyping, I know, but at some level I can't help putting this down to Girls Like Lists. Girls Do Not Like Tree Diagrams (with your evident and wonderful exception). Why Girls Like Lists affects the many complex people who aren't girls, I couldn't say, but it really, really seems to, besides which, a lot of the people who write about this a lot are girls.
So please, keep hammering away at your concept! I expect it'll turn out better than mine.
(no subject)
Date: 2007-12-19 12:42 am (UTC)There are cool coding techniques that basically involve taking a pair of scalars, mapping them to a coordinate on a plane that is tiled with numbered polygons whose sizes are tuned to match the frequency distribution of the underlying data. I forget off-hand what these codes are called, but it could work pretty well because they can be rather more compact than separately encoding the scalars. It should generalise to more than 2 dimensions too. I've seen this in CELP (codebook excited linear prediction) voice codecs, amongst other places.
(I did audio DSP work for a while. It was fun)
(no subject)
Date: 2007-12-19 02:49 am (UTC)Of course, I bet you if you actually produce the encoding, people will try to find exceptions it doesn't handle until you get bored because they dislike the idea of reductionism (in other words, however adequate your system becomes, greater adequacy will be required of it). I am genuinely interested to know whether you are tenacious enough to keep going until they get bored. Being, yourself, not at a white-bread corner like me, I think you stand some chance of success (though the Luddites will never adopt your system, and probably become pseudopagans out of community habit after a while).
(no subject)
Date: 2007-12-19 04:10 pm (UTC)Come to think of it, that may perhaps want to be subdivided somewhat. Frex, the vast majority of my partners have been female, but if one were trying to make qualitative data out of that it's probably a very relevant contextual issue that the vast majority of my friends and people I talk to in general are female, so a sizable amount of the visible bias in partners is due to available sample of people to know well enough to come to fancy.
For a counterexample on that axis, I think it's clear from listening to any significant amount of Leonard Cohen's work or interviews that he a) is primarily sexually interested in women and b) that his primary interest for close friendship and communication is with men; the number of Cohen songs that are addressed to another man about a woman is quite astounding.