add section on content-centric filtering

2025-07-11 04:04:17 +00:00 · 2019-07-18 16:47:54 -04:00 · 2019-07-18 16:47:54 -04:00 · f04ac292d4
commit f04ac292d4
parent 15fc950a42
1 changed files with 28 additions and 0 deletions
--- a/README.org
+++ b/README.org
@ -568,6 +568,34 @@ things that ACLs could never safely do... because [[http://waterken.sourceforge.

 *** Content-centric filtering

+When spam began to become a serious problem for email, Paul Graham
+wrote a famous essay called [[http://www.paulgraham.com/spam.html][A Plan for Spam]].
+The general idea was to use content filtering, specifically bayesian
+algorithms, to detect spam.
+At the time of this article's release, this worked surprisingly well,
+with the delightful property that spammers' own messages would
+themselves train the systems.
+
+Fast forward many years and the same fundamental idea of content
+filtering has gotten much more advanced, but so have the attacks
+against it.
+Neural networks can catch patterns, but also can also increasingly
+generate hard to detect forms of those same patterns, even generating
+[[https://openai.com/blog/better-language-models/][semi-plausible stories]] based off of short prompts.
+While most spam sent today is sent using what we might call "amateur"
+methods, possible sophisticated attacks are getting worse and worse.
+
+To add to this problem, false-negatives from these systems can be
+disasterous.
+[[https://www.nytimes.com/2017/03/20/technology/youtube-lgbt-videos.html][YouTube has marked non-sexual LGBT+ videos as "sensitive"]], and
+many machine learning systems have been found to pick up
+[[https://www.propublica.org/article/machine-bias-risk-assessments-in-criminal-sentencing][racist assumptions]] from their surrounding environment.
+
+This isn't to say that content filtering can't be a useful complement;
+if a user doesn't want to look at some content with certain words,
+they should absolutely free to filter on them.
+But content filtering shouldn't be the foundation of our systems.
+
 *** Reputation scoring

 *** Going back to centralization