add section on content-centric filtering

This commit is contained in:
Christopher Lemmer Webber 2019-07-18 16:47:54 -04:00
parent 15fc950a42
commit f04ac292d4
No known key found for this signature in database
GPG Key ID: 4BC025925FF8F4D3

View File

@ -568,6 +568,34 @@ things that ACLs could never safely do... because [[http://waterken.sourceforge.
*** Content-centric filtering
When spam began to become a serious problem for email, Paul Graham
wrote a famous essay called [[http://www.paulgraham.com/spam.html][A Plan for Spam]].
The general idea was to use content filtering, specifically bayesian
algorithms, to detect spam.
At the time of this article's release, this worked surprisingly well,
with the delightful property that spammers' own messages would
themselves train the systems.
Fast forward many years and the same fundamental idea of content
filtering has gotten much more advanced, but so have the attacks
against it.
Neural networks can catch patterns, but also can also increasingly
generate hard to detect forms of those same patterns, even generating
[[https://openai.com/blog/better-language-models/][semi-plausible stories]] based off of short prompts.
While most spam sent today is sent using what we might call "amateur"
methods, possible sophisticated attacks are getting worse and worse.
To add to this problem, false-negatives from these systems can be
disasterous.
[[https://www.nytimes.com/2017/03/20/technology/youtube-lgbt-videos.html][YouTube has marked non-sexual LGBT+ videos as "sensitive"]], and
many machine learning systems have been found to pick up
[[https://www.propublica.org/article/machine-bias-risk-assessments-in-criminal-sentencing][racist assumptions]] from their surrounding environment.
This isn't to say that content filtering can't be a useful complement;
if a user doesn't want to look at some content with certain words,
they should absolutely free to filter on them.
But content filtering shouldn't be the foundation of our systems.
*** Reputation scoring
*** Going back to centralization