The Roar of the Masses Could Be Farts*

This writer may be unclear on the concept of signal to noise ratio. Just because you have more "data" doesn't mean you have more "information".

He writes:

I admit that, in addition to the possibilities for finding something interesting, there may also be the prospect of discovering suggestive but ultimately incorrect or misleading patterns[my emphasis]. But I feel this problem would surely be greatly ameliorated by more and better metadata.

And there, in fact, lies the problem.

That is pure supposition and assumes facts not at all in evidence, more data might create more signal, or it might create more noise. Earlier he had written:

Once again, I remind you that I know nothing of Mr Revere, or his conversations, or his habits or beliefs, his writings (if he has any) or his personal life. All I know is this bit of metadata, based on membership in some organizations.

The writer doesn't understand that he in fact DOES know something about Mr. Revere. He knows enough to eliminate all other organizations Mr Revere may be a member of! For example, Mr. Revere may also be a member of several silversmith trade organizations, a member of horse racing clubs, lantern collectors, etc.. The "disturbing" associations shown here may be repeated several times over based on Mr. Revere's tendency to be a social gadfly, or attempts to more widely market his wares by membership in as many social clubs as he possibly can. The writer also limits the groups he uses to small groups with few people, well, it makes the math easier, right? Where's the harm in that?

The problem with matrix math is that every added row or column increases the computation exponentially, and the big metadata "sifting" is attempting to do this for all individuals, and all connections, everywhere. Might as well try to connect how a butterfly flapping it's wings in Mongolia makes a pope resign in Rome. After all, they both belong to the club of air-breathers, and the organization of carbon-based living organizms, as well as the group of finely-adorned creatures, displaying colorful raiments. So there are some natural, yet disturbing, connections there.

h/t turcopolier

* dboon is rolling in his grave these days.

goldberry's picture
goldberry on

He might have a point. The Fellgett advantage says that for multiplexed data, the signal to noise ratio improves by the square root of the number of samples taken. It's something that crystallography and other analytical chemistry techniques rely on and upon which the Fourier transform relies.
So, what we need to understand is how this advantage is applied to this set of metadata. They're looking for something. What is it?

lambert on

That's an interesting idea. If whatever they are trying to do is NP-complete, it can't be done....

But we would also have to take in the possibilities of massive parallel computation, a la Google's map reduce.

We might also take into account, somehow, that the financial industry sucks some of the brightest technical minds into its black hole, because of the money.

okanogen's picture
okanogen on

As a person who actually does fourier transforms, and multiplexes data, and does signal processing and probablistic analysis of data as part of my day job, the problem is you need a signal to multiplex! Trust me, as I posted previously (, none of the experts doing this are under any illusion that what they are saying they are doing is actually what they are doing or will ever work.They will NEVER find a terrorist, or terrorist cell, using this technique. Certainly whatever else they are doing, that ain't it.

So, in classic "when tin-foil isn't tin-foily", rather than speculate wildly about what we are afraid "they" would like to do, we should view this using the benefit of experience (what they HAVE done) combined with what is most possible.

So here are the two to three things I think will shake out,

We live in a kleptocracy, this is yet another tool for ensuring rents are paid. This tool will actually be useable for shutting down media file sharing, i.e. copyright infringement. It could also be used to track large drug enterprises like cartels. It may be used forcombatting cyber-warfare, however it may wind up just as powerfully as a weapon against those who gathered the data. You never know.

Unauthorized use by the people tasked with keeping the data will likely be rampant. Why not? It will be extremely useful to business to find out who their competitors are talking to, and who within those their customers or competitors are influencers. Some of these bureaucrats may be motivated by conscience to tell what they know to everyone (as we have just seen), but many more will be motivated by a buck to tell their corporate buyers information on anything or anyone. It will be a vast treasure of demographic data, and the corporations who share that information with the government will themselves have deniability when they decide to do their own snooping. We can also expect political reprisals, depending on which tribe is in charge, and the ubiquitous fuel to celebrity cult fires. The British tabloid press-style "scandal", except on steroids and fueled by government-sanctioned actions.

But in general, this is the new military-industrial complex at work, creating this surveillance state is as much Big Money, as it is Big Data, and yet another pulsing vein for yet another breed to jab a huge blood-funnel in to. This is the natural order of America, a huge market has been found, and it must be tapped. It WILL be tapped. Just like every last drop and BTU of fossil fuel WILL be withdrawn. There is no stopping any of it.

And it will be hacked....