Road Trip Week 4: The Soul of a New Taste Machine in San Francisco

anon_qoga said in #2050 1y ago: received

We're still becalmed in San Francisco, hanging out with friends, working a bit more on Sofiechan, and this week experiencing life in "the tenderknob".

As usual, the observation must be made that SF would be truly world-beating if they could only keep the streets clean and the street characters charming instead of scary or disturbing. Apart from the filth, SF's civic plaza reminds me of Vienna. Some of the architecture is very nice.

Maybe it's a filter. If SF were clean, it would be way too desirable for entirely normal bougie reasons. Maybe it would kill the magic. That's probably cope though. What if good things are not secretly bad or trade-offs, but are just good, and we're just missing out? The reality is that it's just difficult to organize people politically to solve obvious public problems without getting mired in corruption, bloat, and incompetence. Occasionally, like in San Francisco, this gets quite bad. So it goes.

In a previous trip to SF, George Hotz told me that the parasitic professional class created by the bureaucratic mode of organization is the problem and we should try to route around them with AI. I dislike "AI" as a concept and prefer the term "statistics" or "software", but otherwise this has an interesting connection to what we're doing with sofichan.

I want to scale governance by live player taste to a large community using statistical algorithms and crowdsourced signals of quality, without empowering any third class of oligarchic moderators. Good moderation without all the politics of a moderator class is the classic dream, and the core bet of sofiechan is that it's basically a technical problem. It probably has application beyond forum moderation if we get it working.

We're still in the research phase, getting the "taste machine" running reliably. We've been through a few iterations of the taste algorithms so far. It has been workable for current needs for a while, but not yet stable enough to turn loose on a large number of people and posts. Hence limited focus on growth around here.

One challenge has just been that the inference architecture was entangled with the recordkeeping and user interaction system. This made things inefficient, complex, untestable, error-prone, and hard to work on. As overall architecture needs have become clear, I'm building a new highly self-contained taste machine kernel that is much better on all these dimensions. That's what I've been working on this week.

The core engine of sofiechan is now an iterative bayesian estimation of various quantities of interest, especially everyone's quality of judgement in voting, everyone's quality as a poster, and the quality of individual posts. These are all predicted from each other, making the problem somewhat circular. But iteratively improving the solution using robust estimators rapidly converges to identify true quality and taste with reasonable accuracy. At least on simulated data.

In reality there is no "true" quality, so the administrator's judgements and other signals will be used to uniquely identify which taste equilibrium we want out of the space of possibilities. As things grow, who trusts and vouches for who, who has contributed in various ways, and many other signals will also be used to add information to the system to improve and better determine the solution. And the whole thing can be tuned for controllability by the administrator's vision. The nice thing about Bayesian methods is they very naturally encode and solve this kind of multi-domain "sensor fusion" inference problem.

Meanwhile, we'll be around in SF another week, then hitting the road up to Tahoe, Reno, and across the Nevada desert to Utah.

We're still becalmed received

anon_ruri said in #2066 1y ago: received

> Good moderation without all the politics of a moderator class is the classic dream, and the core bet of sofiechan is that it's basically a technical problem

Forgive me, but I'm skeptical of this framing. For a small enough community, it's easier to go through all comments yourself. At scale, moderation-by-ML fails as it's not designed to resist adversarial attacks (e.g., non-English content on Twitter and Facebook is effectively unmoderated). This leaves around 2-3 orders-of-magnitude where you can and should have real mods. Getting rid of them is probably a good idea, but surely there are much more consequential decisions and bets for sofiechan than just moderation for communities of a certain size?

> The core engine of sofiechan is now an iterative bayesian estimation of various quantities of interest

So far it seems to work well enough, although it feels odd when your comment has <35% or >65% approval upon posting for no discernible reason. The question is how well it works when the true quality signal becomes increasingly sparse as the community grows. Intuitively, there should be some kind of 'Goldilocks zone' in community size where it saves you from reading everything, but still has enough true signal to work well.

referenced by: >>2067

Forgive me, but I'm received

anon_qoga said in #2067 1y ago: received

>>2066
>At scale, moderation-by-ML fails as it's not designed to resist adversarial attacks (e.g., non-English content on Twitter and Facebook is effectively unmoderated).
Defense against adversarial trolls is a different issue from moderating foreign content.

For domestic moderation against adversarial content, there is no reason an ML or other software system can't be effectively designed to do a great deal of the lifting. The defining feature of unwanted content is that it is unwanted, and people won't be willing to go out of their way to click the little "I like this" button on it. They might even click the "make this go away" button. You don't need very many such clicks to start the system learning to predict what the good stuff is and what isn't.

There are two immediate problems with democracy and therefore a naive interpretation of this information: the first is low taste and/or hostile voters, who may even be imported specifically to manipulate the vote. The second is "what if what the people want is wrong"? These are closely related: the question is who gets to decide who has power. At sofiechan, we have a simple answer: monarchy. The administrator has ultimate authority on who and what is legitimate.

This transforms the problem. The voter is not exercising a right, but sending a signal. The system simply interprets that signal for its information value. The goal of the learning algorithm is to predict what content and people we (the royal we) want around here, from those signals. This is a very well defined problem of the type algorithmic statistics excels at. It reduces to this in practice: low quality and hostile content will be identified and de-emphasized because people that the algorithm trusts will tell the algorithm they don't like it. If that trust is misplaced, it will be recomputed retroactively after being corrected by those closer to the core of the trust network (in the limit, the monarch). The only question is how data efficient this can get, and therefore how far it can scale. I suspect the "goldilocks zone" where there's still enough signal is quite large.

The alternatives I've heard of all amount to expensive ways to avoid embracing this core political problem: who gets to decide, and who do they trust? Usually this is some combination of ML algorithms hobbled by having to be supposedly neutral and objective, and hired armies of underpaid or unpaid (and therefore low quality) professional moderators who are trust-managed using crude bureaucratic methods. All the critiques I've heard of algorithmic moderation assume this failed paradigm.

With respect to foreign language issues, eg facebook's problems with foreigners organizing genocides or whatever on their platform, ask this: do we (as a self-governing political community of english-speaking Americans and Europeans) have any real business defining what goes on in other languages and political communities? Not really. Discourse moderation is political. To moderate their discourse would be to wade in to their politics in a real way. Maybe we want to do that, but it's beyond scope for social media algorithms.

>surely there are much more consequential decisions and bets for sofiechan than just moderation for communities of a certain size?
You are right. Ultimately our aim is to provision a trustworthy worldview and social network to our political community. There are a lot more consequential decisions involved in that than just moderation. But I think getting moderation ie governance right is generally underestimated in importance.

> it feels odd when your comment has <35% or >65% approval upon posting for no discernible reason.
Forgive the squeaks and rattles in the prototype taste machines, but it guesses, based on how much it trusts you, how good your post is going to be. Newer users have lower ratings, as do people whose other posts don't do well. The next version will be more accurate.

referenced by: >>2069

Defense against adve received

anon_ruri said in #2069 1y ago: received

>>2067
Thank you for elaborating on your vision for sofiechan, a firm hand at the helm is indeed a must for such a project.

> The goal of the learning algorithm is to predict what content and people we (the royal we) want around here, from those signals

I was lucky to work on a initiative within a big tech firm that paralleled your implementation of moderation in some ways, with positive yet inconclusive results. The company was in the Q&A business, and although we were a hundred times smaller than Google, let me use it as an example, as I can't speak directly and know that a similar initiative was undertaken at Google a bit earlier.

In the past, Google used a formula for ranking web pages for a given query: page relevance + page quality + page usability. This worked reasonably well, but employees consistently complained about certain issues: why were some pages, suggesting treatments for cancer with baking soda, ranked highly? Why did StackOverflow often rank higher than official documentation for programming queries? Why were there numerous conspiracy theories appearing in results for political queries?

Despite constant complaints over the years, no action was taken for a long time - according to the formula, these search results were no worse than others, and asking underpaid moderators to remove low-quality content only led to minor improvements. Eventually, an ambitious VP decided to at least fix medical queries, hiring highly-credentialed doctors as moderators and incorporating their judgments into the ranking formula for medical websites.

It was overnight success, allowing to reliably distinguish medical science from crystal healing. Unfortunately, this couldn't be replicated for most queries - few fields had widely-accepted authorities. After a lot of hand-wringing, one daring analyst suggested instead ranking users from fringe cranks to respected experts and increasing the rankings of sites that were frequented by the latter.

As a result, the overall quality of search results visibly improved, but all metrics barely changed, which was unheard of for such dramatic change. Nonetheless this new approach was celebrated and officially added to the formula as "authoritativeness" of sites. Personally, I liked the new search better, but couldn't shake off the feeling that we just made search results more appealing to ourselves by downranking anything that the proles touched.

referenced by: >>2073

Thank you for elabor received

anon_qoga said in #2073 1y ago: received

>>2069
That's very interesting. Indeed the way to think about it is that it just made the search rankings more like what the median higher class citizen wanted to see. I'm not too surprised a demographic-epistemic trade-off like this wouldn't impact bulk metrics too much, but it would be interesting to hear more about that.

The issue with Google's idea of authoritativeness (and Wikipedia etc) is that it becomes the consensus of the current regime and its partisans. That's all fine and good in the abstract, except that the current regime's perspective is failing. We have seen this blatantly over the past few years with covid messaging, biden's mental health, etc. besides the deeper issues of what it means to have a viable society. When people complain about google or youtube rankings letting CNN and official lies crowd out people talking to each other about their authentic experience and thoughts, it's this political consensus they are complaining about.

The current regime is probably still higher quality in its epistemics than the median prole alternative facts peddler, which is a decent reason why you might prefer it. But you do lose a lot of gems by hiding away everything the median MSNBC-believer doesn't want to see. This is what smart people are complaining about when they say it no longer turns up interesting theories on obscure forums.

Again, moderation is fundamentally political and the political is necessarily fractious. The brief moment in history where it seemed like it was possible to produce a universal objective neutrality with free speech or at least authoritative truth was simply the hubristic zenith of the global American empire. What this means practically is that google et al are just offering one perspective in a world that demands many, and this will become an increasingly obvious opportunity.

As for us, we owe no epistemic allegiance to this failing regime. We want high quality authoritative information, but more specifically we want one that works for us. We want *our* high quality authoritative information. Being free of allegiance means we are free to produce a perspective that is just plain better than what is currently on tap, but it also means we have an awful lot of work to do to figure out what's true and good without help from the official sources. One of the major aims of this project is to build a community and epistemic infrastructure to make this sort of thing possible. Doing so would be of historical significance.

Again I'll repeat that moderation (which is to say information governance) is the core of this problem, and much underrated. Specifically, the claim sofiechan believes that no one else seems to believe is that good forum moderation is regime-complete.

referenced by: >>2077 >>2097

That's very interest received

xenophon said in #2077 1y ago: received

>>2073
> the claim sofiechan believes that no one else seems to believe is that good forum moderation is regime-complete.

This is key. Another way to pose the matter is: what is a good information-governance protocol for a regime in the 21st century? It's not "free speech" as in the U.S. of 1990's. Nor is it "manual censorship" as in the Soviet Union.

There is a space of mostly-automated protocols; within that space, we seek one that supports truth-seeking via discussion, with full recognition that truth must ultimately be judged.

referenced by: >>2081

This is key. Another received

anon_qoga said in #2081 1y ago: received

>>2077
>what is a good information-governance protocol for a regime in the 21st century? It's not "free speech" as in the U.S. of 1990's. Nor is it "manual censorship" as in the Soviet Union.
Good question. I'm fairly confident sofiechan is on the right track with all this cybernetic direct democracy and crowdsourced cyber-monarchy business. At least in concept, it coherently integrates a lot of the virtues of elite theory, monarchy, and democracy. It's not lying about anything, it could be way more responsive to the people's needs than any kind of simple dictatorship, and it has a theory of organization of power. But two important questions remain:

1. Who is the king? On an internet forum, the founder-administrator-monarch principle is well enough established, but suppose we take this to its political conclusion as a system of government for eg America. Who's the boss? The president? Maybe maybe. That would be a different world indeed. Ultimately the king is appointed by God and God-given custom, and we basically have neither right now. Does America need a king? Will we get one? If not a king, how would it work?

2. What are the rules? It's all well and good to propose all this mechanism for subtle speech rule enforcement, but we still have to actually decide all the detailed rules. Overall I favor a "common law" approach where rather than try to fiat up some perfect discourse rules, we respond to things by deciding cases and using those results to build up a living body of custom. Of course for discourse it's all a lot more subjective and soft than can be handled by actual law, so take it as an analogy. Theoretically, the sofiechan democratic cyber-monarchy system should be able to do this. But here we are actually in the thing. At some point we have to hit the object level.

Let's take our friends over in the anti-indian racism thread as a test case. Is that appropriate, or counterproductive? One view is that we should just have the discussion and fight it out with all the embarassing stupidity implied in the idea of a racism thread, as long as everyone stays civilized. Another idea is that certain topics are out of scope of public discourse no matter how civilized because they are too hard to have in a productive manner without just causing social trouble. How should racism (not) be done?

Generally my view is that having discussion of many currently taboo topics is actually really important, even if we don't endorse all the positions represented, because the taboos tend to make all of discourse and social relation worse if they are never addressed. It's like price fixing. Better to allow price discovery even if it's socially disruptive, because the alternative is worse. But I can also see the value of some suppression to prevent the socially disruptive topics from getting out of control or becoming the "official" focus. That would be bad.

I haven't hidden the thread, but I don't disagree with its currently low rating either. In any case I'm more interested in this question of how to properly discuss or not discuss such matters than I am in the matter itself.

referenced by: >>2083

Good question. I'm f received

anon_favy said in #2083 1y ago: received

>>2081
>Generally my view is that having discussion of many currently taboo topics is actually really important, even if we don't endorse all the positions represented, because the taboos tend to make all of discourse and social relation worse if they are never addressed. ... But I can also see the value of some suppression to prevent the socially disruptive topics from getting out of control or becoming the "official" focus. That would be bad.

It's a tricky balance. Discussion of taboos is a tricky balance because, on the one hand, if you’ve got politically load-bearing taboos you *can’t* discuss, then you are not intellectually sovereign, and you lose. On the other hand, if you discuss it a lot, then you’ll draw a crowd that *just wants to say the taboo thing*, rather than to participate in your larger project. (This is especially, but not uniquely, true for racism.) If that’s allowed to get enough momentum, then the best people will get bored and leave, and you lose. I suspect that the *frequency* of the discussion matters more than the *content*, for this process. In the absence of a general solution, "the mod suppresses this stuff whenever he thinks it's getting to be a bit much" seems fine at our size.

referenced by: >>2084

It's a tricky balanc received

anon_qoga said in #2084 1y ago: received

>>2083
I'm sympathetic to the frequency rule. Taboo breaking is not what we're about, but it is important to be able to discuss. Therefore it should be suppressed to a nonzero level. Enough to establish that we assert the right, to blow the steam off, and to see what's there, not so much that it gets in the way of what we're actually interested in.

The way the algorithm works, "the mod" is all of us. Of course the admin can adjust things, but it all amounts to hiding threads and posts that aren't the right thing, and endorsing those that are.

I'm sympathetic to t received

anon_ruri said in #2097 1y ago: received

>>2073
> I'm not too surprised a demographic-epistemic trade-off like this wouldn't impact bulk metrics too much, but it would be interesting to hear more about that.

It seems obvious when you put it like that, but the initial expectation was that we would lose double-digit percentages of traffic by downranking outright woo for medical queries. However, instead, we shifted more than half of views to something at least mildly plausible from likes of crystal healing, and users didn't care. This is even more surprising considering the kind of users who landed on such content - there was usually a clear pattern in their behavior before that, suggesting distrust toward medical authorities and a tendency towards very dubious medical practices. For me, the fact that ranking crystal healing lower caused most of those people to switch to something else is genuinely shocking.

Now, I don't want to overstate the case - perhaps weak moderation drove most of the true believers away from the platform beforehand or after some time it dawned on users that we tipped the scales and they reacted only after several months of thorough A/B testing. It's possible, but my takeaway is that a lot of people, even skeptics, don't really have any fixed beliefs.

The second surprise, which allowed this practice to scale from just medical content to everything else, was the strong correlation between all high-class markers. We tried ranking users in different ways - by their propensity toward alternative healing methods, inferred income, credentials, writing and reading more thoughtful (longer) content, and even attempted to rank them based on the quality of restaurants they visited (avg review for a given avg menu price). All of these attempts yielded correlations above 0.9, meaning that it essentially didn't matter which method we chose.

Of course, there were some wrinkles in all of this - small business owners earned more than their suggested class, and some hipsters who wrote the best content were actually dirt poor. However, on balance, the connection was strong enough to drown out such outliers and ubiquitous enough to explain away any political bias.

In hindsight, the entire experience made me question whether free speech or universal objective neutrality could exist at all. After all, it only took a distaste for pseudoscience to discover that some perspectives are better than others.

referenced by: >>2098 >>2101

It seems obvious whe received

anon_qoga said in #2098 1y ago: received

>>2097
This is great stuff. Interesting but not surprising find that most people just watch what's on the TV, especially those with weaker epistemics. Somewhat surprising though that they weren't seeking it out, or that their habits are that flexible. I guess it means the hardcore alternative set is too small. I suspect a lot of these epistemic stances are more about what's easy for them than what they believe deeply and independently. They hated mainstream medicine because the alt stuff was easier to get.

Related to that hypothesis is something I've said before which is that sanity is a collective logistical problem. You can't be sane as an individual, you need good sources who have painstakingly curated methods and content of good thought. If it's not available, you don't get there. I fear much like the industrial base, America has lost its sanity supply chains, too. Or rather corrupted them to the point where they don't work for increasing numbers of people.

One of my motivations for being interested in sofiehan is the possibility of building out higher quality independent sanity supply chains.

Also interesting that everything correlates heavily via class. I've said this for years. Very important fact for a project like this that hopes to discriminate along a bunch of those dimensions.

This is great stuff. received

xenophon said in #2101 1y ago: received

>>2097
> ... the strong correlation between all high-class markers. We tried ranking users in different ways ... All of these attempts yielded correlations above 0.9, meaning that it essentially didn't matter which method we chose.

It would be interesting to know how the use of these high-class markers played out during the COVID pandemic. Did they downgrade critical examination of vaccine efficacy and safety? Did they upgrade content on the benefits of lockdowns?

It would be interest received

Road Trip Week 4: The Soul of a New Taste Machine in San Francisco

anon_qoga said in #2050 1y ago: log in to judge received 9.6 9.6

anon_ruri said in #2066 1y ago: log in to judge received 3.9 3.9

referenced by: >>2067

anon_qoga said in #2067 1y ago: log in to judge received 8.2 8.2

referenced by: >>2069

anon_ruri said in #2069 1y ago: log in to judge received 6.9 6.9

referenced by: >>2073

anon_qoga said in #2073 1y ago: log in to judge received 5.9 5.9

referenced by: >>2077 >>2097

xenophon said in #2077 1y ago: log in to judge received 7.7 7.7

referenced by: >>2081

anon_qoga said in #2081 1y ago: log in to judge received 4.9 4.9