anon_danu said in #5176 25h ago:
We must assume that a motivated attacker will be able to do bayesian-optimal stylometric de-anonymization, and that this will be far more accurate than you might expect. There is no free speech without anonymity, and there is no free thought without free speech, so weapons-grade anonymization tech is existential for any philosophical community that takes itself seriously. We students of philosophy therefore need to think through how strong this could get, and how it could be effectively countered.
https://x.com/alex_prompter/status/2026951395753213970
The paper claims:
>Users who post under persistent usernames should assume that adversaries can link their accounts to real identities.
Whether they claim to get results stronger than that I don't know. We should read the paper. The twitter hypester claims:
>Every throwaway account. Every anonymous forum post. Every “nobody will connect this to me” comment.
But note that's a breathless extrapolation, not directly quoted.
So let's review the full "de-anonymization kill chain":
1. you post sufficient information and stylometric hints in linkable contexts that a bayesian superintelligence could de-anonymize you.
2. the attacker is able to get access to the information in question.
3. the information posted in that context is such as to piss off and motivate an attacker.
4. the attacker is able to economically use de-anonymization tech to identify you.
5. the attacker decides that the deterrence will be sufficiently damaging to be worth the cost of attack.
6. the attacker is able to create a sufficiently convincing case beyond mere allegation to get you cancelled in court of law or public opinion.
These are conjunctive, so defeating any of them defeats the attack. There are all kinds of tactics to play like flooding the zone with false positives and heresy normalization, decreasing your physical vulnerability to dox, using stylometry attacks preemptively against yourself, etc but when I go through these systematically, I find one dominant tactic which injects difficulty in almost every step: fragment your corpus over more identities. You need over 9000 handles.
The keyword is "linkable". If you have one big pseudonym and one big real name identity, it's almost trivial to link them together. The hypothesis space is relatively tiny, even if there are lots of such accounts to be cross-referenced with lots of real names. But by increasing the fragmentation level, you can explode the enemy's hypothesis space beyond feasible inference.
Suppose they attack by embedding each nym in high dimensional stylometry space and looking for clusters or near-neighbors. If error bars are small, they can de-anon. If they are large and overlap many people, they cannot. Error bars will fall with the amount of information available on each nym. A reddit account has orders of magnitude more information and smaller error bars than a 4chan post. By fragmenting across many non-linkable identities, you can drive the error bars up to the point of making attacks infeasible. You dissolve your corpus into impersonal clusters like /pol/, sofiechan, etc.
Coefficients on cost of attack may go to zero in the long run, but cost of bayesian inference grows exponentially with complexity of inference, hypothesis space, number of handles to be linked together, etc. You can drive cost of attack arbitrarily high to encrypt your philosophical radar cross-section. Anonymity, like cryptography, is defense-dominant given appropriate care and technology.
All this to say, this is why we post on anonymous chans, not pseudonymous platforms. There is an orders of magnitude difference in cost and feasibility of attack between disposable fragmented anonymity and persistent pseudonymity. This warrants more careful analysis, but I would bet anonymity remains a strong possibility.
https://x.com/alex_prompter/status/2026951395753213970
The paper claims:
>Users who post under persistent usernames should assume that adversaries can link their accounts to real identities.
Whether they claim to get results stronger than that I don't know. We should read the paper. The twitter hypester claims:
>Every throwaway account. Every anonymous forum post. Every “nobody will connect this to me” comment.
But note that's a breathless extrapolation, not directly quoted.
So let's review the full "de-anonymization kill chain":
1. you post sufficient information and stylometric hints in linkable contexts that a bayesian superintelligence could de-anonymize you.
2. the attacker is able to get access to the information in question.
3. the information posted in that context is such as to piss off and motivate an attacker.
4. the attacker is able to economically use de-anonymization tech to identify you.
5. the attacker decides that the deterrence will be sufficiently damaging to be worth the cost of attack.
6. the attacker is able to create a sufficiently convincing case beyond mere allegation to get you cancelled in court of law or public opinion.
These are conjunctive, so defeating any of them defeats the attack. There are all kinds of tactics to play like flooding the zone with false positives and heresy normalization, decreasing your physical vulnerability to dox, using stylometry attacks preemptively against yourself, etc but when I go through these systematically, I find one dominant tactic which injects difficulty in almost every step: fragment your corpus over more identities. You need over 9000 handles.
The keyword is "linkable". If you have one big pseudonym and one big real name identity, it's almost trivial to link them together. The hypothesis space is relatively tiny, even if there are lots of such accounts to be cross-referenced with lots of real names. But by increasing the fragmentation level, you can explode the enemy's hypothesis space beyond feasible inference.
Suppose they attack by embedding each nym in high dimensional stylometry space and looking for clusters or near-neighbors. If error bars are small, they can de-anon. If they are large and overlap many people, they cannot. Error bars will fall with the amount of information available on each nym. A reddit account has orders of magnitude more information and smaller error bars than a 4chan post. By fragmenting across many non-linkable identities, you can drive the error bars up to the point of making attacks infeasible. You dissolve your corpus into impersonal clusters like /pol/, sofiechan, etc.
Coefficients on cost of attack may go to zero in the long run, but cost of bayesian inference grows exponentially with complexity of inference, hypothesis space, number of handles to be linked together, etc. You can drive cost of attack arbitrarily high to encrypt your philosophical radar cross-section. Anonymity, like cryptography, is defense-dominant given appropriate care and technology.
All this to say, this is why we post on anonymous chans, not pseudonymous platforms. There is an orders of magnitude difference in cost and feasibility of attack between disposable fragmented anonymity and persistent pseudonymity. This warrants more careful analysis, but I would bet anonymity remains a strong possibility.
We must assume that