compilerbitch: That's me, that is! (Default)
[personal profile] compilerbitch
Hi folks,

I was chatting to [livejournal.com profile] foxypinkninja the other day, expounding my usual rant about the inadequacy of English when defining imperfectly understood concepts related particularly to gender and sexuality. It occured to me that there might be a more effective, admittedly geekier, solution that could put an end to all world strife. Well, probably not, but it would be fun, anyway. Chatting a bit more to [livejournal.com profile] doseybat today made me decide to post about the idea before brain fade did for it and consigned it to the maybe-perhaps-when pile.



The basic thesis is pretty simple: we all are usually defined by large characteristics, such as male, female, tall, short, sporty, couchpotatoish, gay, straight, bi, lesbian, mono, poly, polyfi, swinger, BDSM, vanilla, trans, cisgendered, etc. These large characteristics are all inherently broad-brush definitions that are easily broken by specific counterexamples, and in any case they have definitions that are difficult to agree upon. My argument is that, in reality, we are defined by a potentially very large number of small characteristics that collectively (and often imperfectly) entail the large characteristics. By finding a large (though hopefully reasonably minimal) set of small characteristics that individually are easily agreed upon, it then becomes possible to define something like a sociological genome for people, potentially allowing the aforementioned large characteristics to be discarded.

By definition, a characteristic is small if and only if it doesn't subdivide into component characteristics (e.g. male is not a small characteristic because it can be subdivided into (at the very least) physical primary and secondary sexual characteristics, which may or may not match the person's gender orientation).

If it's possible to come up with a set of such characteristics that are necessary (in the sense that removing them would reduce specificity) and sufficient (in the sense that adding more characteristics wouldn't do much to improve specificity), we can encode that as a bit string that can then be represented conveniently. I had a couple of ideas for this. One, obvious, way is to encode it as a hex number, because this is reasonably compact, ASCII compatible and relatively easy to cut and paste or to type manually. Given your 'geek ID', it should then be relatively simple to build gadgets like PDA/cellphone/web applications that can encode and decode these things, and (interestingly) compare them with those of people near you, possibly automatically.

OK, here's the second (and possibly geekier) part of the idea. This came out of figuring out how it might be possible to derive a reasonably compact set of characteristics -- we don't really want to have the okcupid.com umpty gazillion not-very-relevant questions here. It occurred to me that it is probably possible to have a bunch of people take an online test deliberately with way more questions than are strictly necessary, then feed all their results into some of the kinds of algorithm used for pulling trees from DNA sequences. I originally thought of tools like MrBayes, but doseybat recommends that I probably find out about Principal Components Analysis. Once you get a tree, it's probably fairly straightforward to do an analysis on the results to figure out which characteristics are predictive and which can be discarded as noise, so from that it should be possible to find a reduced set that, when analysed, generate the same (or at least a very similar) tree.

Anyway, there are more things that you can do with a tree like that. For a start, you can use it to classify new data, so (assuming the original tree is detailed enough) you don't need to do PCA every time you have a new person come along and want to know where they end up in the tree. The second (and IMO very cool) thing you can do is, once you've found somewhere in the tree to place someone, you can encode their position in the tree by starting at the root and making a note of which branches lead to the relevant leaf. Such encodings are likely to be very compact and efficient, certainly O(log N) in size with respect to the number of leaves in the tree, but (cooler still) if you encode the position in the tree most-significant-bit first as you go from the root to the leaf, you end up with a number with some interesting properties: given any two data points on the tree in close proximity to each other, their encodings are also guaranteed to be close to each other. A simple binary enoding running MSB to LSB isn't ideal, though, because you can still end up with adjacent codes that actually reflect the top and lower bounds of not-really-very-connected main branches. A better encoding would space the allocation of codes such that, if you take the tree and encode it, then do cluster analysis on the codes you get the tree back again. This would be incredibly cool -- we could then have a single number, say in the range 0-9999 decimal, which both accurately places us in the tree, and also makes it possible to compare ourselves with other people just by taking the difference in the code. If I'm a 1378 and you're a 1392, we're probably going to be very similar, but if you're a 9326, that's unlikely. The actual ordering isn't necessarily meaningful, of course, but differences actually are. I am tempted to cheekily flip the tree around a bit so that the numbers form some kind of conservative -- liberal continuum, but it might be better to avoid that for political reasons. Or not. I'd certainly like it if it worked like that, though I suspect that the fattest branches will probably correspond most closely to the familiar large characteristics, though it is significantly possible that they may actually not do so.

It's also possible to use this a bit like a dating site by answering the questions as if you were someone you'd want to date. Even cooler, you could also answer it as if you were someone you'd never date in a million years. We could then have a pin badge with three numbers on it: a green one, specifying who we'd like to meet, a black one specifying ourself, and a red one specifying, if you're close to this number, just don't even ask!

So, cool, or a step too far?

(no subject)

Date: 2007-12-18 01:35 am (UTC)
From: [identity profile] compilerbitch.livejournal.com
The other idea I had is to print the codes on a credit card-sized thingy, with a barcoded version of the underlying data across one edge. This would make it possible to visually compare your code with someone else's. If there are sufficiently few bits, it might even be possible to print a key on the back so you can read off what the bits actually mean...

(no subject)

Date: 2007-12-18 02:11 am (UTC)
From: [identity profile] splodgenoodles.livejournal.com
Trees are nice.

(no subject)

Date: 2007-12-18 02:57 am (UTC)
From: [identity profile] ewtikins.livejournal.com
Personally I like the "It's none of your business" answer to people asking me about sexuality and so on. Carrying information like that on a card? Really don't want to do it, even by choice.

(no subject)

Date: 2007-12-18 03:36 am (UTC)
From: [identity profile] compilerbitch.livejournal.com
I was always pretty sure that the idea would be guaranteed to be *really* offensive to some people. I tend to habitually be very open about that kind of thing generally, which may freak out the occasional person but I think, for me anyway, it helps break down barriers when trying to get to know people. For myself, I'd happily wear such a badge because I think it would be fun, and might even lead to breaking the ice with someone interesting, but I do understand your concern.

(no subject)

Date: 2007-12-18 03:57 am (UTC)
From: [identity profile] technolope.livejournal.com
I read recently that Ghandi was a really open person. I'd rather everyone know a lot about me than just the government and credit rating agencies.

(no subject)

Date: 2007-12-18 03:16 pm (UTC)
From: [identity profile] pplfichi.livejournal.com
I agree here, though having a code may be useful where it's faster and easier then using more fuzzy terms where one does want to explain.

I wonder if some kind of mask could be used to add (or null out?) information...

(no subject)

Date: 2007-12-18 02:57 am (UTC)
From: [identity profile] jmtkalcich.livejournal.com
combine this with a massive speed dating event in huge convention center to as to apply the data in a meaningful scenerio as a test.

Could be fun.

(no subject)

Date: 2007-12-18 03:37 am (UTC)
From: [identity profile] compilerbitch.livejournal.com
Yes -- arranging the rooms as 'bins' for particular number ranges would be very interesting, particularly in clusters with dodgier common interests!

(no subject)

Date: 2007-12-19 03:59 pm (UTC)
From: [identity profile] rysmiel.livejournal.com
Have you read Triton ? There's a lovely riff near the end of that on bars as meeting spaces for every possible specific sexual orientation.

(no subject)

Date: 2007-12-18 03:05 am (UTC)
ext_3375: Banded Tussock (Default)
From: [identity profile] hairyears.livejournal.com


Are we looking at an XML Schema Document here?

You've already hinted at a more sophisticated document than a table, even one with a partially-hierarchical structure and nested subtables*, as there are stochastic elements to the structure as well as Bayesian weightings on individual data elements.

This is more than just an academic exercise, as there is a need for the sytematisation of 'Bayesian' data structures in the description of genetic conditions known to be influenced by large populations of genes. In essence, we know that A influences B, and that C is probably a subset of B but possibly of the related systems X and Y. There are estimated probabilities for these relationships but no universally-accepted data description like, say, relational tables in third normal form.


Me, I just bash numbers into the little grid of cells on a spreadsheet.


*FPML is the most widely-used example.

(no subject)

Date: 2007-12-18 03:45 am (UTC)
From: [identity profile] compilerbitch.livejournal.com
To represent trees, yes, tables suck at that kind of thing. XML is possible. But the whole idea of the encoding is to reflect the position within the tree as a single (albeit cunningly encoded) scalar, making it easy to manipulate directly by human.

I think I'm OK with an analysis that makes the assumption of an underlying tree, because whilst a (possibly probabilistic) directed graph would be more general, it wouldn't lend itself well to the kind of encoding tricks I was hoping for.

Good old-fashioned cluster analysis is probably good enough, and O(N2) isn't too much of a problem these days. (Basically, start by finding the closest pair of data points classically by minimising the sum of the squares of differences, then pull those points out and replace them with an averaged point, which represents a twig from which the two data points hang. Keep doing this until you run out of data, and you've built a tree).

(no subject)

Date: 2007-12-18 03:50 am (UTC)
From: [identity profile] compilerbitch.livejournal.com
(though principal components analysis is probably better -- I must get around to reading about it)

(no subject)

Date: 2007-12-18 08:29 am (UTC)
From: [identity profile] fluffyrichard.livejournal.com
Cooincidentally, I've been doing lots of work recently on clustering text documents by similarity, and have just implemented a simple clustering algorithm like the one you describe. Am just about to move on to some more advanced algorithms which basically involve doing PCA first...

(no subject)

Date: 2007-12-19 01:01 am (UTC)
From: [identity profile] compilerbitch.livejournal.com
Cool!

As an aside, I'm intending to visit the UK for a week or two around the end of January/beginning of February. I'll probably be staying in Cambridge for at least half of that time, so if you're up for meeting up at some point for a good chat, that would be great. :-)

(no subject)

Date: 2007-12-19 04:02 pm (UTC)
From: [identity profile] rysmiel.livejournal.com
Neigbour-joining would certainly be more tractable, but... I'm not actually used to thinking "this is a conversation I should not let $boss2 see" for the reason that it would set off his rant about maximum-likelihood and phylogenetic errors and long-branch attraction artifacts and so on.

(no subject)

Date: 2007-12-18 04:03 am (UTC)
From: [identity profile] technolope.livejournal.com
Might I suggest arranging the data and doing PCA in MeV, a spiffy and flexible piece of data analysis software. [livejournal.com profile] capital_l is the lead developer. She demonstrated the application at a conference using historical housing prices from all 50 states. When grouped, the southern states were together, the midwest was grouped, etc.

And might I also suggest the benefits of a time component in this code? The long tail concept also suggests additional complexity in the code. It might as well allow its leaf nodes to be extended by ever-more esoteric classifications.

(no subject)

Date: 2007-12-18 07:22 am (UTC)
From: [identity profile] compilerbitch.livejournal.com
I had thought about that... so long as the original tree doesn't shift, you can always add precision, e.g. version 2 being 5 digit, so a version 1 code of 2345 becomes 23450 in the version 2 code. Or maybe it's better just to redo it, I don't know. Future proofing is definitely an issue, since cultures do shift over time.

(no subject)

Date: 2007-12-18 07:23 am (UTC)
From: [identity profile] compilerbitch.livejournal.com
Hmm... yes, not having to implement PCA from scratch would be happier fun...

(no subject)

Date: 2007-12-18 09:47 am (UTC)
From: [identity profile] green-knight.livejournal.com
The problem I've seen with things like geekcode and its derivatives (does anyone use them anymore? It's ages since I've seen one in a signature) is that they started measuring a small set of values with a fixed scale; but people a) didn't recognise themselves on that scale and added further mutations, and b) added new measurements, sometimes shared only by a small subgroup. and in the end it fell apart because who wants to read three lines of code?

(no subject)

Date: 2007-12-19 12:28 am (UTC)
From: [identity profile] compilerbitch.livejournal.com
It's partly geek code that inspired this. I've had a geek code block for years, and whilst it's a good bit of fun, I'm inspired to create something actually useful, employing some heavy duty coding (in the en-coding sense rather than the programming sense of the word) to make the code much more compact.

You're quite right, no one wants to read three lines of code, but a 4 digit decimal number and a bit of mental subtraction is a lot more accessible, I think. They would make cool LJ icons, if nothing else!

(no subject)

Date: 2007-12-18 09:52 am (UTC)
emperor: (Default)
From: [personal profile] emperor
I suspect one of the tricker problems here would be finding a good set of questions...

(no subject)

Date: 2007-12-19 12:30 am (UTC)
From: [identity profile] compilerbitch.livejournal.com
Very much so. That's why I was keen on the idea of basically starting with a large set of (still carefully thought out) questions, then using statistical analyses to figure out which characteristics are good at classifying people and which ones aren't.

(no subject)

Date: 2007-12-18 10:00 am (UTC)
vampwillow: (Default)
From: [personal profile] vampwillow
Neat idea, and other than referring to the major and minor elements as large and small (sizeist!) sounds a reasonable idea. As the built structure would be a cross-connected tree I'm not sure that the pure difference between generated numbers/codes would be always meaningful, and you'd probably need to separate out a 'self'-'sought' as two trees otherwise you are duplicating too much.

So ... a spot of R&D?

(no subject)

Date: 2007-12-19 12:31 am (UTC)
From: [identity profile] compilerbitch.livejournal.com
Yep.

I could have used major and minor, but that would have been chordist...

(no subject)

Date: 2007-12-19 03:57 pm (UTC)
From: [identity profile] rysmiel.livejournal.com
I think you could get away with being chordist, pretty much every potential user of the system will have a notochord.

(no subject)

Date: 2007-12-19 11:09 pm (UTC)

(no subject)

Date: 2007-12-18 11:24 am (UTC)
From: [identity profile] parthenogenocid.livejournal.com
You'd get exactly the same problems as when people build these trees for DNA etc. and then try to get some meaning out of them. Different reasonable-sounding algorithms can give you completely different trees. Small changes in the data set can give you completely different trees.

Building trees works best when there is divergent descent with gradual mutation but little or no coalescence or crossbreeding of long-separated lines. Crossing datapoints from different parts of the tree gives you what appears to be a branch that diverges early from the root on a short branch, then is isolated from everything else by a large distance.

(no subject)

Date: 2007-12-19 12:34 am (UTC)
From: [identity profile] compilerbitch.livejournal.com
I really have no idea what trees we'll get by analysing this data. It could be that people don't classify well in this way, which would be easy to spot by dividing up the data set and seeing if different bits of it generate similar trees.

It could also be that some cheating may be necessary with the tree to massage it into a sensible shape. I'd rather not do that, but since I'm not actually trying to make any claims based on it, just generate a usable, compact code, I think it's fair game if needs be.

(no subject)

Date: 2007-12-18 03:31 pm (UTC)
From: [identity profile] rysmiel.livejournal.com
Don't post things like that such that I'll see them first thing in the morning, I might just fall helplessly in love.

(no subject)

Date: 2007-12-19 12:07 am (UTC)
From: [identity profile] compilerbitch.livejournal.com
*giggles*

*blushes*

(no subject)

Date: 2007-12-18 03:55 pm (UTC)
From: [identity profile] andrewwyld.livejournal.com
Two things: Firstly, I'd imagine a lot of people's primary and secondary sexual characteristics and perceived gender do line up, so you could set the testing stage to reduce complexity for those who aren't complex.  You can then have a sort of "advanced settings" thing specifically for anyone who doesn't fit the basic categories.

Secondly, I think people actually like the ambiguity.  A few years ago I developed a "gender plane" concept which, I hoped, would rid us of all the damn lists.  You have a plane whose axes, from zero to one, are labelled "male" and "female".  You then pick a point on the plane.  Primary and secondary physical and psychological characteristics have a plane each (four in total).  Sexual preference can fit on a plane or set of planes too.  The great thing is this allows for possibilities like asexuality, people who don't feel strongly gendered, and people who feel strong pulls to both genders, as well as people who possess complex physical characteristics, roughly 0.1% of the population.

I suggested this idea gleefully to the many complex people I know, where it was met with suspicion, bafflement and slight (friendly) hostility.

I think this is itself a complex phenomenon.  Easy classification is resented because it can lead to stereotypes.  I think there may also a slightly tribal tendency to want to exclude "normals" from understanding too easily (rather like regional slangs designed to separate insiders from outsiders).

I have also had people argue that, for example, a planar system doesn't allow for enough complexity.  These people then try and lump all sorts of unrelated things into gender and sexuality.  It isn't, of course, a complete personality template, but it does bring me to what I think is the last reason for resenting a classification system.

Girls Like Lists.  Most women's magazines have articles like "242 style must-haves" or "376 shoes you must eat this summer" or such.  The number is always a comfortably weird one, clearly not combinatorial or a power of two.  I know people who prefer a system with three classifications of sexual preference ("male", "female" and "both") to my planar system which includes "null" as well.  I know of at least one real-life near-asexual person.  It's sexual stereotyping, I know, but at some level I can't help putting this down to Girls Like Lists.  Girls Do Not Like Tree Diagrams (with your evident and wonderful exception).  Why Girls Like Lists affects the many complex people who aren't girls, I couldn't say, but it really, really seems to, besides which, a lot of the people who write about this a lot are girls.

So please, keep hammering away at your concept!  I expect it'll turn out better than mine.

(no subject)

Date: 2007-12-19 12:42 am (UTC)
From: [identity profile] compilerbitch.livejournal.com
Basically the planes you're talking about, like [livejournal.com profile] rysmiel's dimensions, are great candidates for what I called small characteristics. I'd want to quantise them more than you'd prefer, possibly, because I want a compact representation, but that's unlikely to be an issue since I doubt many people would differentiate much between 4% and 5%, or for that matter between 40% and 50% on those planes.

There are cool coding techniques that basically involve taking a pair of scalars, mapping them to a coordinate on a plane that is tiled with numbered polygons whose sizes are tuned to match the frequency distribution of the underlying data. I forget off-hand what these codes are called, but it could work pretty well because they can be rather more compact than separately encoding the scalars. It should generalise to more than 2 dimensions too. I've seen this in CELP (codebook excited linear prediction) voice codecs, amongst other places.

(I did audio DSP work for a while. It was fun)

(no subject)

Date: 2007-12-19 02:49 am (UTC)
From: [identity profile] andrewwyld.livejournal.com
Yeah, I chose continuous planes because I wasn't looking at an actual encoding.  Of course I actually consider myself to inhabit a corner of most (I'm straight/straight and male/male/male/male, using the most complex representation space I could think of) but other people fancy one gender maybe 75% as much as another, etc.  I would honestly expect most people to normalize their own preference so their main preference is at 1 and their lesser preference is at some fraction, so it's not that good a representation of actual lustfulness/genderedness anyway, and you might as well just use the corners and half-points on the edges.

Of course, I bet you if you actually produce the encoding, people will try to find exceptions it doesn't handle until you get bored because they dislike the idea of reductionism (in other words, however adequate your system becomes, greater adequacy will be required of it).  I am genuinely interested to know whether you are tenacious enough to keep going until they get bored.  Being, yourself, not at a white-bread corner like me, I think you stand some chance of success (though the Luddites will never adopt your system, and probably become pseudopagans out of community habit after a while).

(no subject)

Date: 2007-12-19 04:10 pm (UTC)
From: [identity profile] rysmiel.livejournal.com
but other people fancy one gender maybe 75% as much as another, etc.

Come to think of it, that may perhaps want to be subdivided somewhat. Frex, the vast majority of my partners have been female, but if one were trying to make qualitative data out of that it's probably a very relevant contextual issue that the vast majority of my friends and people I talk to in general are female, so a sizable amount of the visible bias in partners is due to available sample of people to know well enough to come to fancy.

For a counterexample on that axis, I think it's clear from listening to any significant amount of Leonard Cohen's work or interviews that he a) is primarily sexually interested in women and b) that his primary interest for close friendship and communication is with men; the number of Cohen songs that are addressed to another man about a woman is quite astounding.

(no subject)

Date: 2007-12-18 04:15 pm (UTC)
From: [identity profile] rysmiel.livejournal.com
More seriously, and significant caffeine later, I've done some noodling in this direction myself, which might possibly be of interest to you. Though I am more inclined to think in terms of identifying the relatively few axes that it seems to me would encompass the vast majority of the variations I see people caring intensely about a priori, I'd not been thinking of looking at fine-grained distinctions and then doing the PCA or similar.

Have you read Stafford Beer's Designing Freedom ? Not directly relevant to sexuality, but informs a lot of how I think about complexity, and I would recommend it highly.

(no subject)

Date: 2007-12-19 12:10 am (UTC)
From: [identity profile] compilerbitch.livejournal.com
I've not read that -- I'm tempted to pick it up.

Good noodling, by the way. I think you're fishing in similar waters. My reflex toward fiddling with codes to make them as compact as possible is what inspired the idea of encoding the tree as a scalar code where numeric difference reflects distance within the tree. I think things like that are cute.

(I should get out more. Or stay in more. Or something)

(no subject)

Date: 2007-12-19 04:11 pm (UTC)
From: [identity profile] rysmiel.livejournal.com
Pfft, I like you as you are.

(no subject)

Date: 2007-12-18 10:32 pm (UTC)
From: [identity profile] mstevens.livejournal.com
What's a polyfi?

It sounds like a good algorithm to use on a dating website and make your fortune :)

It's hitting vague memories of neural nets for me, although I failed that course...

(no subject)

Date: 2007-12-19 12:04 am (UTC)
From: [identity profile] rysmiel.livejournal.com
What's a polyfi?

Polyfidelitous == people with multiple partners in a closed long-term relationship, I'm guessing.

I don't like the usage, as I abhor the take on "keeping faith with" entailed in having the primary meaning of "fidelity" being "not sleeping with anyone else", it's viciously destructive of the possible reality of keeping agreements with people where who you sleep with's not an issue.

(no subject)

Date: 2007-12-19 12:19 am (UTC)
From: [identity profile] compilerbitch.livejournal.com
I'm inclined to agree. Oddly enough, I was talking about exactly this with a friend earlier today -- we both came to the conclusion that it's basically necessary to be prepared to risk losing a partner and to have enough self-confidence to cope with that, and to know that, whilst you'd be sad, you'd actually be OK if they did leave. Without that, fear of losing people can be crippling, inevitably leading to jealousy. Whilst I respect people who choose monogamy (I was one of them until fairly recently), or polyfi, ultimately it would seem difficult to really 'get' poly without being able to feel relaxed and happy about partners doing things with other people as they see fit. I'd personally not be very keen on doing the polyfi thing.

(no subject)

Date: 2007-12-19 04:28 pm (UTC)
From: [identity profile] rysmiel.livejournal.com
Oddly enough, I was talking about exactly this with a friend earlier today -- we both came to the conclusion that it's basically necessary to be prepared to risk losing a partner and to have enough self-confidence to cope with that, and to know that, whilst you'd be sad, you'd actually be OK if they did leave. Without that, fear of losing people can be crippling, inevitably leading to jealousy.

While I agree with you, that's not precisely what I was thinking. "Risk losing a partner and know that you could survive it", after all, is an issue in a relationship of any shape, mono or poly, because there could be any number of reasons for that happening. I'm not thinking so much of making an agreement with the attached expectation that people might conceivably break up over finding someone new and shiny, as of making an agreement that acknowledges that new shiny people just aren't an issue that the person in question is going to react to that way.

I think one of the more important dimensions in what I was talking about in that post is the axis between people for whom having sex with someone is OMG Life-Defining and people for whom having sex with someone is a nice thing to do of an afternoon when it's raining and there's nothing decent at the cinema. For myself, I am capable of interacting across a range of that spectrum, assuming people who a) are reasonably self-aware about where they are on it themselves and b) communicate that honestly, which are IMO neither of them unreasonable expectations to have of adults, but at this point, with my existing life, it tends to work a lot better for me to avoid involvements with people too close to the life-defining end. Because I already have a set of commitments to close friends, partner and non-partner alike, at levels which are important and not negotiable to third parties, and this is something I wear upon my sleeve; any new person who thought that sleeping with me might induce me to weaken those commitments would get cut off at the knees, and I don't think it's unreasonable to build agreements resting on that certainty.

(no subject)

Date: 2007-12-19 08:13 am (UTC)
From: [identity profile] mstevens.livejournal.com
Also it sounds disturbingly like audio equipment.

If polyfi is valid presumably we also need monofi, and it all gets very silly.

Profile

compilerbitch: That's me, that is! (Default)
compilerbitch

January 2016

S M T W T F S
     12
3 45 6789
10111213 141516
17181920212223
24 252627282930
31      
Page generated Oct. 16th, 2025 02:02 am

Style Credit

Expand Cut Tags

No cut tags