exclude_badExits/exclude_badExits.py

1336 lines
49 KiB
Python
Raw Normal View History

2022-11-07 05:40:00 +00:00
# -*- mode: python; indent-tabs-mode: nil; py-indent-offset: 4; coding: utf-8 -*-
# https://github.com/nusenu/noContactInfo_Exit_Excluder
# https://github.com/TheSmashy/TorExitRelayExclude
"""
This extends nusenu's basic idea of using the stem library to
dynamically exclude nodes that are likely to be bad by putting them
on the ExcludeNodes or ExcludeExitNodes setting of a running Tor.
* https://github.com/nusenu/noContactInfo_Exit_Excluder
* https://github.com/TheSmashy/TorExitRelayExclude
2022-11-19 10:30:22 +00:00
The basic idea is to exclude Exit nodes that do not have ContactInfo:
* https://github.com/nusenu/ContactInfo-Information-Sharing-Specification
That can be extended to relays that do not have an email in the contact,
or to relays that do not have ContactInfo that is verified to include them.
2022-11-07 11:38:22 +00:00
"""
2022-11-19 10:30:22 +00:00
__prolog__ = __doc__
2022-11-29 14:52:48 +00:00
sGOOD_NODES = """
---
GoodNodes:
EntryNodes: []
Relays:
# ExitNodes will be overwritten by this program
ExitNodes: []
IntroductionPoints: []
# use the Onions section to list onion services you want the
# Introduction Points whitelisted - these points may change daily
# Look in tor's notice.log for 'Every introduction point for service'
Onions: []
# use the Services list to list elays you want the whitelisted
# Look in tor's notice.log for 'Wanted to contact directory mirror'
Services: []
"""
sBAD_NODES = """
BadNodes:
# list the internet domains you know are bad so you don't
# waste time trying to download contacts from them.
ExcludeDomains: []
ExcludeNodes:
# BadExit will be overwritten by this program
BadExit: []
# list MyBadExit in --bad_sections if you want it used, to exclude nodes
# or any others as a list separated by comma(,)
MyBadExit: []
"""
__doc__ +=f"""But there's a problem, and your Tor notice.log will tell you about it:
2022-11-19 10:30:22 +00:00
you could exclude the relays needed to access hidden services or mirror
directories. So we need to add to the process the concept of a whitelist.
2022-11-07 05:40:00 +00:00
In addition, we may have our own blacklist of nodes we want to exclude,
or use these lists for other applications like selektor.
So we make two files that are structured in YAML:
```
2022-11-08 14:15:05 +00:00
/etc/tor/yaml/torrc-goodnodes.yaml
2022-11-29 14:52:48 +00:00
{sGOOD_NODES}
2022-11-07 05:40:00 +00:00
By default all sections of the goodnodes.yaml are used as a whitelist.
2022-11-29 14:52:48 +00:00
Use the GoodNodes/Onions list to list onion services you want the
Introduction Points whitelisted - these points may change daily
Look in tor's notice.log for warnings of 'Every introduction point for service'
2022-11-08 14:15:05 +00:00
/etc/tor/yaml/torrc-badnodes.yaml
2022-11-29 14:52:48 +00:00
{sBAD_NODES}
2022-11-07 05:40:00 +00:00
```
That part requires [PyYAML](https://pyyaml.org/wiki/PyYAML)
2022-11-19 10:30:22 +00:00
https://github.com/yaml/pyyaml/ or ```ruamel```: do
```pip3 install ruamel``` or ```pip3 install PyYAML```;
the advantage of the former is that it preserves comments.
2022-11-07 05:40:00 +00:00
2022-11-19 10:30:22 +00:00
(You may have to run this as the Tor user to get RW access to
/run/tor/control, in which case the directory for the YAML files must
2022-11-29 14:52:48 +00:00
be group Tor writeable, and its parent's directories group Tor RX.)
2022-11-07 05:40:00 +00:00
Because you don't want to exclude the introduction points to any onion
2022-11-16 21:00:16 +00:00
you want to connect to, ```--white_onions``` should whitelist the
2022-11-19 10:30:22 +00:00
introduction points to a comma sep list of onions; we fixed stem to do this:
2022-11-07 05:40:00 +00:00
* https://github.com/torproject/stem/issues/96
* https://gitlab.torproject.org/legacy/trac/-/issues/25417
2022-11-29 14:52:48 +00:00
Use the GoodNodes/Onions list in goodnodes.yaml to list onion services
you want the Introduction Points whitelisted - these points may change daily.
Look in tor's notice.log for 'Every introduction point for service'
```notice_log``` will parse the notice log for warnings about relays and
services that will then be whitelisted.
2022-11-08 14:15:05 +00:00
```--torrc_output``` will write the torrc ExcludeNodes configuration to a file.
2022-11-16 22:15:00 +00:00
```--good_contacts``` will write the contact info as a ciiss dictionary
2022-11-07 11:38:22 +00:00
to a YAML file. If the proof is uri-rsa, the well-known file of fingerprints
2022-11-08 14:15:05 +00:00
is downloaded and the fingerprints are added on a 'fps' field we create
of that fingerprint's entry of the YAML dictionary. This file is read at the
2022-11-07 11:38:22 +00:00
beginning of the program to start with a trust database, and only new
2022-11-08 14:15:05 +00:00
contact info from new relays are added to the dictionary.
2022-11-19 10:30:22 +00:00
Now for the final part: we lookup the Contact info of every relay
that is currently in our Tor, and check it the existence of the
well-known file that lists the fingerprints of the relays it runs.
If it fails to provide the well-know url, we assume its a bad
relay and add it to a list of nodes that goes on ```ExcludeNodes```
(not just ExcludeExitNodes```). If the Contact info is good, we add the
list of fingerprints to ```ExitNodes```, a whitelist of relays to use as exits.
```--bad_on``` We offer the users 3 levels of cleaning:
1. clean relays that have no contact ```=Empty```
2. clean relays that don't have an email in the contact (implies 1)
```=Empty,NoEmail```
3. clean relays that don't have "good' contactinfo. (implies 1)
```=Empty,NoEmail,NotGood```
2022-11-29 14:52:48 +00:00
The default is ```Empty,NoEmail,NotGood``` ; ```NoEmail``` is inherently imperfect
2022-11-19 10:30:22 +00:00
in that many of the contact-as-an-email are obfuscated, but we try anyway.
To be "good" the ContactInfo must:
1. have a url for the well-defined-file to be gotten
2. must have a file that can be gotten at the URL
3. must support getting the file with a valid SSL cert from a recognized authority
4. (not in the spec but added by Python) must use a TLS SSL > v1
5. must have a fingerprint list in the file
2022-11-29 14:52:48 +00:00
6. must have the FP that got us the contactinfo in the fingerprint list in the file.
2022-11-07 11:38:22 +00:00
2022-11-07 05:40:00 +00:00
For usage, do ```python3 exclude_badExits.py --help`
"""
2022-11-19 10:30:22 +00:00
# https://github.com/nusenu/trustor-example-trust-config/blob/main/trust_config
# https://github.com/nusenu/tor-relay-operator-ids-trust-information
2022-11-17 08:58:45 +00:00
import argparse
2022-11-07 05:40:00 +00:00
import os
2022-11-19 10:30:22 +00:00
import json
2022-11-29 12:54:36 +00:00
import re
2022-11-17 08:58:45 +00:00
import sys
2022-11-29 12:54:36 +00:00
import tempfile
2022-11-07 05:40:00 +00:00
import time
2022-11-07 11:38:22 +00:00
from io import StringIO
2022-11-07 05:40:00 +00:00
2022-11-13 04:37:30 +00:00
import stem
2022-11-08 14:15:05 +00:00
from stem import InvalidRequest
2022-11-07 11:38:22 +00:00
from stem.connection import IncorrectPassword
from stem.util.tor_tools import is_valid_fingerprint
2022-11-19 10:30:22 +00:00
import urllib3
2022-11-17 08:58:45 +00:00
from urllib3.util.ssl_match_hostname import CertificateError
# list(ipaddress._find_address_range(ipaddress.IPv4Network('172.16.0.0/12'))
2022-11-07 05:40:00 +00:00
try:
2022-11-16 21:00:16 +00:00
from ruamel.yaml import YAML
2022-11-16 22:15:00 +00:00
yaml = YAML(typ='rt')
yaml.indent(mapping=2, sequence=2)
2022-11-16 21:00:16 +00:00
safe_load = yaml.load
2022-11-07 05:40:00 +00:00
except:
yaml = None
2022-11-16 21:00:16 +00:00
if yaml is None:
try:
import yaml
safe_load = yaml.safe_load
except:
yaml = None
2022-11-07 05:40:00 +00:00
try:
2022-11-17 08:58:45 +00:00
from unbound import RR_CLASS_IN, RR_TYPE_TXT, ub_ctx
2022-11-08 14:15:05 +00:00
except:
ub_ctx = RR_TYPE_TXT = RR_CLASS_IN = None
2022-11-19 10:30:22 +00:00
from support_onions import (bAreWeConnected, icheck_torrc, lIntroductionPoints,
oGetStemController, vwait_for_controller,
yKNOWN_NODNS, zResolveDomain)
from trustor_poc import TrustorError, idns_validate
2022-11-27 01:10:18 +00:00
try:
import xxxhttpx
import asyncio
from trustor_poc import oDownloadUrlHttpx
except:
httpx = None
from trustor_poc import oDownloadUrlUrllib3Socks as oDownloadUrl
2022-11-07 05:40:00 +00:00
global LOG
import logging
2022-11-08 14:15:05 +00:00
import warnings
2022-11-17 08:58:45 +00:00
2022-11-08 14:15:05 +00:00
warnings.filterwarnings('ignore')
2022-11-07 05:40:00 +00:00
LOG = logging.getLogger()
2022-11-19 10:30:22 +00:00
try:
from torcontactinfo import TorContactInfoParser
oPARSER = TorContactInfoParser()
except ImportError:
oPARSER = None
2022-11-13 20:13:56 +00:00
2022-11-29 12:54:36 +00:00
oCONTACT_RE = re.compile(r'([^:]*)(\s+)(email|url|proof|ciissversion|abuse|gpg):')
2022-11-19 10:30:22 +00:00
ETC_DIR = '/usr/local/etc/tor/yaml'
2022-11-29 12:54:36 +00:00
aGOOD_CONTACTS_DB = {}
aGOOD_CONTACTS_FPS = {}
aBAD_CONTACTS_DB = {}
2022-11-19 10:30:22 +00:00
aRELAYS_DB = {}
aRELAYS_DB_INDEX = {}
2022-11-13 04:37:30 +00:00
aFP_EMAIL = {}
2022-11-27 01:10:18 +00:00
aDOMAIN_FPS = {}
2022-11-07 05:40:00 +00:00
sDETAILS_URL = "https://metrics.torproject.org/rs.html#details/"
# You can call this while bootstrapping
2022-11-29 12:54:36 +00:00
sEXCLUDE_EXIT_GROUP = 'ExcludeNodes'
2022-11-08 14:15:05 +00:00
sINCLUDE_EXIT_KEY = 'ExitNodes'
2022-11-07 05:40:00 +00:00
2022-11-08 14:15:05 +00:00
oBAD_ROOT = 'BadNodes'
2022-11-29 14:52:48 +00:00
aBAD_NODES = safe_load(sBAD_NODES)
2022-11-27 01:10:18 +00:00
sGOOD_ROOT = 'GoodNodes'
sINCLUDE_GUARD_KEY = 'EntryNodes'
sEXCLUDE_DOMAINS = 'ExcludeDomains'
2022-11-29 14:52:48 +00:00
aGOOD_NODES = safe_load(sGOOD_NODES)
2022-11-13 20:13:56 +00:00
lKNOWN_NODNS = []
2022-11-19 10:30:22 +00:00
tMAYBE_NODNS = set()
2022-11-08 14:15:05 +00:00
def lYamlBadNodes(sFile,
2022-11-29 12:54:36 +00:00
section=sEXCLUDE_EXIT_GROUP,
tWanted=None):
global aBAD_NODES
2022-11-13 20:13:56 +00:00
global lKNOWN_NODNS
2022-11-19 10:30:22 +00:00
global tMAYBE_NODNS
2022-11-14 11:59:33 +00:00
2022-11-29 12:54:36 +00:00
l = []
if tWanted is None: tWanted = {'BadExit'}
2022-11-17 08:58:45 +00:00
if not yaml:
2022-11-29 12:54:36 +00:00
return l
2022-11-07 05:40:00 +00:00
if os.path.exists(sFile):
with open(sFile, 'rt') as oFd:
2022-11-29 12:54:36 +00:00
aBAD_NODES = safe_load(oFd)
2022-11-08 14:15:05 +00:00
2022-11-29 12:54:36 +00:00
root = sEXCLUDE_EXIT_GROUP
2022-11-08 14:15:05 +00:00
# for elt in o[oBAD_ROOT][root][section].keys():
2022-11-29 12:54:36 +00:00
# if tWanted and elt not in tWanted: continue
2022-11-08 14:15:05 +00:00
# # l += o[oBAD_ROOT][root][section][elt]
2022-11-29 12:54:36 +00:00
for sub in tWanted:
l += aBAD_NODES[oBAD_ROOT][sEXCLUDE_EXIT_GROUP][sub]
2022-11-13 20:13:56 +00:00
2022-11-19 10:30:22 +00:00
tMAYBE_NODNS = set(safe_load(StringIO(yKNOWN_NODNS)))
2022-11-27 01:10:18 +00:00
root = sEXCLUDE_DOMAINS
2022-11-29 12:54:36 +00:00
if sEXCLUDE_DOMAINS in aBAD_NODES[oBAD_ROOT] and aBAD_NODES[oBAD_ROOT][sEXCLUDE_DOMAINS]:
tMAYBE_NODNS.update(set(aBAD_NODES[oBAD_ROOT][sEXCLUDE_DOMAINS]))
2022-11-07 05:40:00 +00:00
return l
def lYamlGoodNodes(sFile='/etc/tor/torrc-goodnodes.yaml'):
2022-11-29 14:52:48 +00:00
global aGOOD_NODES
2022-11-07 05:40:00 +00:00
l = []
if not yaml: return l
if os.path.exists(sFile):
with open(sFile, 'rt') as oFd:
2022-11-16 21:00:16 +00:00
o = safe_load(oFd)
2022-11-29 14:52:48 +00:00
aGOOD_NODES = o
2022-11-27 01:10:18 +00:00
if 'EntryNodes' in o[sGOOD_ROOT].keys():
l = o[sGOOD_ROOT]['EntryNodes']
2022-11-07 05:40:00 +00:00
# yq '.Nodes.IntroductionPoints|.[]' < /etc/tor/torrc-goodnodes.yaml
return l
2022-11-14 11:59:33 +00:00
def bdomain_is_bad(domain, fp):
2022-11-13 20:13:56 +00:00
global lKNOWN_NODNS
if domain in lKNOWN_NODNS: return True
2022-11-19 10:30:22 +00:00
if domain in tMAYBE_NODNS:
2022-11-13 20:13:56 +00:00
ip = zResolveDomain(domain)
if ip == '':
LOG.debug(f"{fp} {domain} does not resolve")
lKNOWN_NODNS.append(domain)
2022-11-19 10:30:22 +00:00
tMAYBE_NODNS.remove(domain)
2022-11-13 20:13:56 +00:00
return True
2022-11-14 11:59:33 +00:00
for elt in '@(){}$!':
if elt in domain:
LOG.warn(f"{elt} in domain {domain}")
return True
2022-11-13 20:13:56 +00:00
return False
2022-11-09 09:30:43 +00:00
tBAD_URLS = set()
2022-11-19 10:30:22 +00:00
lAT_REPS = ['[]', ' at ', '(at)', '[at]', '<at>', '(att)', '_at_',
'~at~', '.at.', '!at!', '<a>t', '<(a)>', '|__at-|', '<:at:>',
2022-11-29 12:54:36 +00:00
'[__at ]', '"a t"', 'removeme at ', ' a7 ', '{at-}'
'[at}', 'atsign', '-at-', '(at_sign)', 'a.t',
'atsignhere', ' _a_ ', ' (at-sign) ', "'at sign'",
'(a)', ' atsign ', '(at symbol)', ' anat ', '=at=',
'-at-', '-dot-', ' [a] ','(at)', '<a-t<>', '[at sign]',
'"at"', '{at}', '-----symbol for email----', '[at@]',
'(at sign here)', '==at', '|=dot|','/\t',
]
2022-11-19 10:30:22 +00:00
lDOT_REPS = [' point ', ' dot ', '[dot]', '(dot)', '_dot_', '!dot!', '<.>',
2022-11-29 12:54:36 +00:00
'<:dot:>', '|dot--|', ' d07 ', '<dot=>', '(dot]', '{dot)',
'd.t', "'dot'", '(d)', '-dot-', ' adot ',
'(d)', ' . ', '[punto]', '(point)', '"dot"', '{.}',
'--separator--', '|=dot|', ' period ', ')dot(',
2022-11-19 10:30:22 +00:00
]
2022-11-27 01:10:18 +00:00
lNO_EMAIL = [
'<nobody at example dot com>',
2022-11-19 10:30:22 +00:00
'<nobody at none of your business xyz>',
'<not-set@example.com>',
'@snowden',
'ano ano@fu.dk',
'anonymous',
'anonymous@buzzzz.com',
'check http://highwaytohoell.de',
2022-11-27 01:10:18 +00:00
'no-spam@tor.org',
2022-11-19 10:30:22 +00:00
'no@no.no',
2022-11-27 01:10:18 +00:00
'noreply@bytor.com',
'not a person <nomail at yet dot com>',
'not@needed.com',
2022-11-19 10:30:22 +00:00
'not@needed.com',
'not@re.al',
'nothanks',
'nottellingyou@mail.info',
'ur@mom.com',
'your@e-mail',
'your@email.com',
2022-11-27 01:10:18 +00:00
r'<nothing/at\\mail.de>',
2022-11-19 10:30:22 +00:00
]
2022-11-29 12:54:36 +00:00
#
lMORONS = ['hoster:Quintex Alliance Consulting ']
2022-11-19 10:30:22 +00:00
def sCleanEmail(s):
s = s.lower()
for elt in lAT_REPS:
2022-11-29 12:54:36 +00:00
if not elt.startswith(' '):
s = s.replace(' ' + elt + ' ', '@')
s = s.replace(elt, '@')
2022-11-19 10:30:22 +00:00
for elt in lDOT_REPS:
2022-11-29 12:54:36 +00:00
if not elt.startswith(' '):
s = s.replace(' ' + elt + ' ', '.')
2022-11-19 10:30:22 +00:00
s = s.replace(elt, '.')
s = s.replace('(dash)', '-')
2022-11-29 12:54:36 +00:00
s = s.replace('hyphen ', '-')
2022-11-19 10:30:22 +00:00
for elt in lNO_EMAIL:
2022-11-29 12:54:36 +00:00
s = s.replace(elt, '?')
2022-11-19 10:30:22 +00:00
return s
2022-11-29 12:54:36 +00:00
lEMAILS = ['abuse', 'email']
2022-11-08 14:15:05 +00:00
lINTS = ['ciissversion', 'uplinkbw', 'signingkeylifetime', 'memory']
2022-11-17 08:58:45 +00:00
lBOOLS = ['dnssec', 'dnsqname', 'aesni', 'autoupdate', 'dnslocalrootzone',
'sandbox', 'offlinemasterkey']
2022-11-19 10:30:22 +00:00
def aCleanContact(a):
2022-11-17 08:58:45 +00:00
# cleanups
2022-11-07 11:38:22 +00:00
for elt in lINTS:
if elt in a:
a[elt] = int(a[elt])
for elt in lBOOLS:
2022-11-19 10:30:22 +00:00
if elt not in a: continue
if a[elt] in ['y', 'yes', 'true', 'True']:
a[elt] = True
else:
a[elt] = False
2022-11-29 12:54:36 +00:00
for elt in lEMAILS:
2022-11-19 10:30:22 +00:00
if elt not in a: continue
a[elt] = sCleanEmail(a[elt])
if 'url' in a.keys():
a['url'] = a['url'].rstrip('/')
if a['url'].startswith('http://'):
domain = a['url'].replace('http://', '')
elif a['url'].startswith('https://'):
domain = a['url'].replace('https://', '')
else:
domain = a['url']
a['url'] = 'https://' + domain
2022-11-08 14:15:05 +00:00
a.update({'fps': []})
2022-11-19 10:30:22 +00:00
return a
2022-11-27 01:10:18 +00:00
def bVerifyContact(a=None, fp=None, https_cafile=None):
global aFP_EMAIL
2022-11-19 10:30:22 +00:00
global tBAD_URLS
global lKNOWN_NODNS
2022-11-29 12:54:36 +00:00
global aGOOD_CONTACTS_DB
global aGOOD_CONTACTS_FPS
2022-11-27 01:10:18 +00:00
assert a
assert fp
assert https_cafile
2022-11-09 05:43:26 +00:00
keys = list(a.keys())
2022-11-19 10:30:22 +00:00
a = aCleanContact(a)
2022-11-27 01:10:18 +00:00
a['fp'] = fp
2022-11-13 04:37:30 +00:00
if 'email' not in keys:
a['email'] = ''
if 'ciissversion' not in keys:
aFP_EMAIL[fp] = a['email']
LOG.warn(f"{fp} 'ciissversion' not in {keys}")
2022-11-27 01:10:18 +00:00
return a
2022-11-07 11:38:22 +00:00
# test the url for fps and add it to the array
2022-11-09 05:43:26 +00:00
if 'proof' not in keys:
2022-11-13 04:37:30 +00:00
aFP_EMAIL[fp] = a['email']
2022-11-09 05:43:26 +00:00
LOG.warn(f"{fp} 'proof' not in {keys}")
2022-11-07 11:38:22 +00:00
return a
2022-11-08 14:15:05 +00:00
2022-11-29 12:54:36 +00:00
if aGOOD_CONTACTS_FPS and fp in aGOOD_CONTACTS_FPS.keys():
aCachedContact = aGOOD_CONTACTS_FPS[fp]
2022-11-09 18:43:54 +00:00
if aCachedContact['email'] == a['email']:
2022-11-29 12:54:36 +00:00
LOG.info(f"{fp} in aGOOD_CONTACTS_FPS")
2022-11-09 09:30:43 +00:00
return aCachedContact
2022-11-14 11:59:33 +00:00
2022-11-09 05:43:26 +00:00
if 'url' not in keys:
if 'uri' not in keys:
2022-11-08 14:15:05 +00:00
a['url'] = ''
2022-11-13 04:37:30 +00:00
aFP_EMAIL[fp] = a['email']
2022-11-09 05:43:26 +00:00
LOG.warn(f"{fp} url and uri not in {keys}")
2022-11-08 14:15:05 +00:00
return a
a['url'] = a['uri']
2022-11-13 04:37:30 +00:00
aFP_EMAIL[fp] = a['email']
2022-11-09 05:43:26 +00:00
LOG.debug(f"{fp} 'uri' but not 'url' in {keys}")
2022-11-08 14:15:05 +00:00
# drop through
2022-11-13 20:13:56 +00:00
2022-11-19 10:30:22 +00:00
domain = a['url'].replace('https://', '').replace('http://', '')
# domain should be a unique key for contacts?
2022-11-14 11:59:33 +00:00
if bdomain_is_bad(domain, fp):
2022-11-19 10:30:22 +00:00
LOG.warn(f"{domain} is bad - {a['url']}")
2022-11-14 11:59:33 +00:00
LOG.debug(f"{fp} is bad from {a}")
2022-11-13 20:13:56 +00:00
return a
2022-11-16 21:00:16 +00:00
2022-11-13 20:13:56 +00:00
ip = zResolveDomain(domain)
2022-11-09 18:43:54 +00:00
if ip == '':
2022-11-13 04:37:30 +00:00
aFP_EMAIL[fp] = a['email']
LOG.debug(f"{fp} {domain} does not resolve")
lKNOWN_NODNS.append(domain)
2022-11-27 01:10:18 +00:00
return a
2022-11-14 11:59:33 +00:00
2022-11-27 01:10:18 +00:00
return True
2022-11-29 12:54:36 +00:00
def oVerifyUrl(url, domain, fp=None, https_cafile=None, timeout=20, host='127.0.0.1', port=9050, oargs=None):
2022-11-27 01:10:18 +00:00
if bAreWeConnected() is False:
raise SystemExit("we are not connected")
2022-11-29 12:54:36 +00:00
if url in tBAD_URLS:
LOG.debug(f"BC Known bad url from {domain} for {fp}")
return None
o = None
2022-11-07 11:38:22 +00:00
try:
2022-11-27 01:10:18 +00:00
if httpx:
LOG.debug(f"Downloading from {domain} for {fp}")
# await
o = oDownloadUrl(url, https_cafile,
timeout=timeout, host=host, port=port,
content_type='text/plain')
else:
LOG.debug(f"Downloading from {domain} for {fp}")
o = oDownloadUrl(url, https_cafile,
timeout=timeout, host=host, port=port,
content_type='text/plain')
2022-11-09 05:43:26 +00:00
# requests response: text "reason", "status_code"
except AttributeError as e:
2022-11-29 12:54:36 +00:00
LOG.exception(f"BC AttributeError downloading from {domain} {e}")
tBAD_URLS.add(url)
2022-11-09 05:43:26 +00:00
except CertificateError as e:
2022-11-29 12:54:36 +00:00
LOG.warn(f"BC CertificateError downloading from {domain} {e}")
tBAD_URLS.add(url)
2022-11-09 05:43:26 +00:00
except TrustorError as e:
2022-11-13 04:37:30 +00:00
if e.args == "HTTP Errorcode 404":
aFP_EMAIL[fp] = a['email']
2022-11-29 12:54:36 +00:00
LOG.warn(f"BC TrustorError 404 from {domain} {e.args}")
2022-11-13 04:37:30 +00:00
else:
2022-11-29 12:54:36 +00:00
LOG.warn(f"BC TrustorError downloading from {domain} {e.args}")
tBAD_URLS.add(url)
2022-11-19 10:30:22 +00:00
except (urllib3.exceptions.MaxRetryError, urllib3.exceptions.ProtocolError,) as e: # noqa
#
2022-11-17 08:58:45 +00:00
# maybe offline - not bad
2022-11-29 12:54:36 +00:00
LOG.warn(f"BC MaxRetryError downloading from {domain} {e}")
2022-11-17 08:58:45 +00:00
except (BaseException) as e:
2022-11-29 12:54:36 +00:00
LOG.error(f"BC Exception {type(e)} downloading from {domain} {e}")
2022-11-07 11:38:22 +00:00
else:
2022-11-29 12:54:36 +00:00
return o
return None
# async
# If we keep a cache of FPs that we have gotten by downloading a URL
# we can avoid re-downloading the URL of other FP in the list of relays.
# If we paralelize the gathering of the URLs, we may have simultaneous
# gathers of the same URL from different relays, defeating the advantage
# of going parallel. The cache is global aDOMAIN_FPS.
def aVerifyContact(a=None, fp=None, https_cafile=None, timeout=20, host='127.0.0.1', port=9050, oargs=None):
global aFP_EMAIL
global tBAD_URLS
global lKNOWN_NODNS
global aDOMAIN_FPS
global aBAD_CONTACTS_DB
assert a
assert fp
assert https_cafile
domain = a['url'].replace('https://', '').replace('http://', '').rstrip('/')
a['url'] = 'https://' + domain
if domain in aDOMAIN_FPS.keys():
a['fps'] = aDOMAIN_FPS[domain]
return a
r = bVerifyContact(a=a, fp=fp, https_cafile=https_cafile)
if r is not True:
return r
if a['url'] in tBAD_URLS:
a['fps'] = []
return a
if a['proof'] == 'dns-rsa':
if ub_ctx:
fp_domain = fp + '.' + domain
if idns_validate(fp_domain,
libunbound_resolv_file='resolv.conf',
dnssec_DS_file='dnssec-root-trust',
) == 0:
LOG.warn(f"{fp} proof={a['proof']} - validated good")
a['fps'] = [fp]
aGOOD_CONTACTS_FPS[fp] = a
else:
a['fps'] = []
return a
# only test url for now drop through
url = a['url']
else:
url = a['url'] + "/.well-known/tor-relay/rsa-fingerprint.txt"
o = oVerifyUrl(url, domain, fp=fp, https_cafile=https_cafile, timeout=timeout, host=host, port=port, oargs=oargs)
if not o:
LOG.warn(f"BC Failed Download from {url} ")
a['fps'] = []
tBAD_URLS.add(url)
aBAD_CONTACTS_DB[fp] = a
elif a['proof'] == 'dns-rsa':
# well let the test of the URL be enough for now
LOG.debug(f"Downloaded from {url} ")
a['fps'] = [fp]
aDOMAIN_FPS[domain] = a['fps']
elif a['proof'] == 'uri-rsa':
2022-11-27 01:10:18 +00:00
a = aContactFps(oargs, a, o, domain)
2022-11-29 12:54:36 +00:00
if a['fps']:
LOG.debug(f"Downloaded from {url} {len(a['fps'])} FPs for {fp}")
else:
aBAD_CONTACTS_DB[fp] = a
LOG.debug(f"BC Downloaded from {url} NO FPs for {fp}")
2022-11-27 01:10:18 +00:00
aDOMAIN_FPS[domain] = a['fps']
return a
2022-11-14 11:59:33 +00:00
2022-11-27 01:10:18 +00:00
def aContactFps(oargs, a, o, domain):
global aFP_EMAIL
global tBAD_URLS
global aDOMAIN_FPS
2022-11-13 20:13:56 +00:00
2022-11-27 01:10:18 +00:00
if hasattr(o, 'status'):
status_code = o.status
else:
status_code = o.status_code
if status_code >= 300:
aFP_EMAIL[fp] = a['email']
LOG.warn(f"Error from {domain} {status_code} {o.reason}")
# any reason retry?
tBAD_URLS.add(a['url'])
return a
if hasattr(o, 'text'):
data = o.text
else:
data = str(o.data, 'UTF-8')
l = data.upper().strip().split('\n')
LOG.debug(f"Downloaded from {domain} {len(l)} lines {len(data)} bytes")
if oargs.wellknown_output:
sdir = os.path.join(oargs.wellknown_output, domain,
'.well-known', 'tor-relay')
try:
if not os.path.isdir(sdir):
os.makedirs(sdir)
sfile = os.path.join(sdir, "rsa-fingerprint.txt")
with open(sfile, 'wt') as oFd:
oFd.write(data)
except Exception as e:
2022-11-29 12:54:36 +00:00
LOG.warn(f"Error writing {sfile} {e}")
2022-11-27 01:10:18 +00:00
a['modified'] = int(time.time())
if not l:
LOG.warn(f"Downloaded from {domain} empty for {fp}")
else:
a['fps'] = [elt.strip() for elt in l if elt \
and len(elt) == 40 \
and not elt.startswith('#')]
LOG.info(f"Downloaded from {domain} {len(a['fps'])} FPs")
2022-11-13 20:13:56 +00:00
return a
2022-11-19 10:30:22 +00:00
def aParseContact(contact, fp):
2022-11-13 20:13:56 +00:00
"""
2022-11-14 11:59:33 +00:00
See the Tor ContactInfo Information Sharing Specification v2
2022-11-13 20:13:56 +00:00
https://nusenu.github.io/ContactInfo-Information-Sharing-Specification/
"""
a = {}
2022-11-19 10:30:22 +00:00
if not contact:
2022-11-29 12:54:36 +00:00
LOG.warn(f"BC null contact for {fp}")
2022-11-19 10:30:22 +00:00
LOG.debug(f"{fp} {contact}")
return {}
2022-11-29 12:54:36 +00:00
contact = contact.split(r'\n')[0]
for elt in lMORONS:
contact = contact.replace(elt)
m = oCONTACT_RE.match(contact)
# 450 matches!
if m and m.groups and len(m.groups(0)) > 2 and m.span()[1] > 0:
i = len(m.groups(0)[0]) + len(m.groups(0)[1])
contact = contact[i:]
2022-11-19 10:30:22 +00:00
# shlex?
lelts = contact.split(' ')
if not lelts:
2022-11-29 12:54:36 +00:00
LOG.warn(f"BC empty contact for {fp}")
2022-11-19 10:30:22 +00:00
LOG.debug(f"{fp} {contact}")
return {}
2022-11-29 12:54:36 +00:00
2022-11-14 11:59:33 +00:00
for elt in lelts:
2022-11-19 10:30:22 +00:00
if ':' not in elt:
# hoster:Quintex Alliance Consulting
2022-11-29 12:54:36 +00:00
LOG.warn(f"BC no : in {elt} for {contact} in {fp}")
# return {}
# try going with what we have
break
2022-11-19 10:30:22 +00:00
(key , val,) = elt.split(':', 1)
2022-11-13 20:13:56 +00:00
if key == '':
continue
2022-11-19 10:30:22 +00:00
key = key.rstrip(':')
a[key] = val
a = aCleanContact(a)
2022-11-07 11:38:22 +00:00
return a
2022-11-19 10:30:22 +00:00
def aParseContactYaml(contact, fp):
2022-11-08 14:15:05 +00:00
"""
2022-11-14 11:59:33 +00:00
See the Tor ContactInfo Information Sharing Specification v2
2022-11-08 14:15:05 +00:00
https://nusenu.github.io/ContactInfo-Information-Sharing-Specification/
"""
2022-11-07 11:38:22 +00:00
l = [line for line in contact.strip().replace('"', '').split(' ')
if ':' in line]
LOG.debug(f"{fp} {len(l)} fields")
s = f'"{fp}":\n'
2022-11-17 08:58:45 +00:00
s += '\n'.join([f" {line}\"".replace(':', ': \"', 1)
2022-11-07 11:38:22 +00:00
for line in l])
oFd = StringIO(s)
2022-11-16 21:00:16 +00:00
a = safe_load(oFd)
2022-11-07 11:38:22 +00:00
return a
2022-11-07 05:40:00 +00:00
def oMainArgparser(_=None):
2022-11-08 14:15:05 +00:00
try:
from OpenSSL import SSL
lCAfs = SSL._CERTIFICATE_FILE_LOCATIONS
except:
lCAfs = []
CAfs = []
for elt in lCAfs:
if os.path.exists(elt):
CAfs.append(elt)
if not CAfs:
CAfs = ['']
2022-11-07 05:40:00 +00:00
2022-11-07 11:38:22 +00:00
parser = argparse.ArgumentParser(add_help=True,
2022-11-19 10:30:22 +00:00
epilog=__prolog__)
2022-11-08 14:15:05 +00:00
parser.add_argument('--https_cafile', type=str,
help="Certificate Authority file (in PEM)",
default=CAfs[0])
2022-11-07 05:40:00 +00:00
parser.add_argument('--proxy_host', '--proxy-host', type=str,
default='127.0.0.1',
help='proxy host')
2022-11-07 11:38:22 +00:00
parser.add_argument('--proxy_port', '--proxy-port', default=9050, type=int,
2022-11-29 14:52:48 +00:00
help='proxy socks port')
2022-11-07 05:40:00 +00:00
parser.add_argument('--proxy_ctl', '--proxy-ctl',
2022-11-29 14:52:48 +00:00
default='/run/tor/control' if os.path.exists('/run/tor/control') else '9051',
2022-11-07 11:38:22 +00:00
type=str,
help='control socket - or port')
2022-11-08 14:15:05 +00:00
parser.add_argument('--torrc',
2022-11-09 05:43:26 +00:00
default='/etc/tor/torrc-defaults',
2022-11-08 14:15:05 +00:00
type=str,
help='torrc to check for suggestions')
2022-11-09 05:43:26 +00:00
parser.add_argument('--timeout', default=60, type=int,
2022-11-08 14:15:05 +00:00
help='proxy download connect timeout')
2022-11-07 11:38:22 +00:00
2022-11-07 05:40:00 +00:00
parser.add_argument('--good_nodes', type=str,
2022-11-16 21:00:16 +00:00
default=os.path.join(ETC_DIR, 'goodnodes.yaml'),
2022-11-13 20:13:56 +00:00
help="Yaml file of good info that should not be excluded")
2022-11-07 05:40:00 +00:00
parser.add_argument('--bad_nodes', type=str,
2022-11-16 21:00:16 +00:00
default=os.path.join(ETC_DIR, 'badnodes.yaml'),
2022-11-07 05:40:00 +00:00
help="Yaml file of bad nodes that should also be excluded")
2022-11-29 12:54:36 +00:00
parser.add_argument('--bad_on', type=str, default='Empty,NoEmail,NotGood',
2022-11-19 10:30:22 +00:00
help="comma sep list of conditions - Empty,NoEmail,NotGood")
2022-11-08 14:15:05 +00:00
parser.add_argument('--bad_contacts', type=str,
2022-11-09 12:31:08 +00:00
default=os.path.join(ETC_DIR, 'badcontacts.yaml'),
2022-11-08 14:15:05 +00:00
help="Yaml file of bad contacts that bad FPs are using")
2022-11-29 12:54:36 +00:00
parser.add_argument('--saved_only', default=False,
action='store_true',
help="Just use the info in the last *.yaml files without querying the Tor controller")
2022-11-27 01:10:18 +00:00
parser.add_argument('--strict_nodes', type=str, default=0,
choices=['0', '1'],
help="Set StrictNodes: 1 is less anonymous but more secure, although some onion sites may be unreachable")
2022-11-07 05:40:00 +00:00
parser.add_argument('--wait_boot', type=int, default=120,
help="Seconds to wait for Tor to booststrap")
2022-11-13 20:13:56 +00:00
parser.add_argument('--points_timeout', type=int, default=0,
help="Timeout for getting introduction points - must be long >120sec. 0 means disabled looking for IPs")
2022-11-14 11:59:33 +00:00
parser.add_argument('--log_level', type=int, default=20,
2022-11-07 05:40:00 +00:00
help="10=debug 20=info 30=warn 40=error")
parser.add_argument('--bad_sections', type=str,
2022-11-29 12:54:36 +00:00
default='BadExit',
help="sections of the badnodes.yaml to use, in addition to BadExit, comma separated")
2022-11-16 21:00:16 +00:00
parser.add_argument('--white_onions', type=str,
2022-11-07 05:40:00 +00:00
default='',
help="comma sep. list of onions to whitelist their introduction points - BROKEN")
2022-11-16 15:30:13 +00:00
parser.add_argument('--torrc_output', type=str,
default=os.path.join(ETC_DIR, 'torrc.new'),
2022-11-07 05:40:00 +00:00
help="Write the torrc configuration to a file")
2022-11-29 14:52:48 +00:00
parser.add_argument('--hs_dir', type=str,
default='/var/lib/tor',
help="Parse the files name hostname below this dir to find Hidden Services to whitelist")
2022-11-29 12:54:36 +00:00
parser.add_argument('--notice_log', type=str,
default='',
2022-11-29 14:52:48 +00:00
help="Parse the notice log for relays and services")
2022-11-19 10:30:22 +00:00
parser.add_argument('--relays_output', type=str,
default=os.path.join(ETC_DIR, 'relays.json'),
help="Write the download relays in json to a file")
2022-11-27 01:10:18 +00:00
parser.add_argument('--wellknown_output', type=str,
default=os.path.join(ETC_DIR, 'https'),
help="Write the well-known files to a directory")
2022-11-16 22:15:00 +00:00
parser.add_argument('--good_contacts', type=str, default=os.path.join(ETC_DIR, 'goodcontacts.yaml'),
2022-11-07 11:38:22 +00:00
help="Write the proof data of the included nodes to a YAML file")
2022-11-07 05:40:00 +00:00
return parser
2022-11-27 01:10:18 +00:00
def vwrite_good_contacts(oargs):
2022-11-29 12:54:36 +00:00
global aGOOD_CONTACTS_DB
2022-11-27 01:10:18 +00:00
good_contacts_tmp = oargs.good_contacts + '.tmp'
with open(good_contacts_tmp, 'wt') as oFYaml:
2022-11-29 12:54:36 +00:00
yaml.dump(aGOOD_CONTACTS_DB, oFYaml)
2022-11-27 01:10:18 +00:00
oFYaml.close()
if os.path.exists(oargs.good_contacts):
bak = oargs.good_contacts +'.bak'
os.rename(oargs.good_contacts, bak)
os.rename(good_contacts_tmp, oargs.good_contacts)
2022-11-29 12:54:36 +00:00
LOG.info(f"Wrote {len(list(aGOOD_CONTACTS_DB.keys()))} good contact details to {oargs.good_contacts}")
bad_contacts_tmp = good_contacts_tmp.replace('.tmp', '.bad')
with open(bad_contacts_tmp, 'wt') as oFYaml:
yaml.dump(aBAD_CONTACTS_DB, oFYaml)
oFYaml.close()
2022-11-27 01:10:18 +00:00
2022-11-29 14:52:48 +00:00
def vwrite_badnodes(oargs, aBAD_NODES, slen, stag):
2022-11-29 12:54:36 +00:00
if not aBAD_NODES: return
tmp = oargs.bad_nodes +'.tmp'
bak = oargs.bad_nodes +'.bak'
with open(tmp, 'wt') as oFYaml:
yaml.dump(aBAD_NODES, oFYaml)
2022-11-29 14:52:48 +00:00
LOG.info(f"Wrote {slen} to {stag} in {oargs.bad_nodes}")
2022-11-29 12:54:36 +00:00
oFYaml.close()
if os.path.exists(oargs.bad_nodes):
os.rename(oargs.bad_nodes, bak)
os.rename(tmp, oargs.bad_nodes)
2022-11-17 08:58:45 +00:00
2022-11-29 14:52:48 +00:00
def vwrite_goodnodes(oargs, aGOOD_NODES, ilen):
2022-11-29 12:54:36 +00:00
tmp = oargs.good_nodes +'.tmp'
bak = oargs.good_nodes +'.bak'
with open(tmp, 'wt') as oFYaml:
2022-11-29 14:52:48 +00:00
yaml.dump(aGOOD_NODES, oFYaml)
2022-11-29 12:54:36 +00:00
LOG.info(f"Wrote {ilen} good relays to {oargs.good_nodes}")
oFYaml.close()
if os.path.exists(oargs.good_nodes):
os.rename(oargs.good_nodes, bak)
os.rename(tmp, oargs.good_nodes)
2022-11-14 11:59:33 +00:00
2022-11-19 10:30:22 +00:00
def lget_onionoo_relays(oargs):
import requests
adata = {}
if oargs.relays_output and os.path.exists(oargs.relays_output):
2022-11-27 01:10:18 +00:00
# and less than a day old?
2022-11-19 10:30:22 +00:00
LOG.info(f"Getting OO relays from {oargs.relays_output}")
try:
with open(oargs.relays_output, 'rt') as ofd:
sdata = ofd.read()
adata = json.loads(sdata)
except Exception as e:
LOG.error(f"Getting data relays from {oargs.relays_output}")
adata = {}
if not adata:
surl = "https://onionoo.torproject.org/details"
LOG.info(f"Getting OO relays from {surl}")
sCAfile = oargs.https_cafile
assert os.path.exists(sCAfile), sCAfile
if True:
try:
o = oDownloadUrl(surl, sCAfile,
timeout=oargs.timeout,
host=oargs.proxy_host,
port=oargs.proxy_port,
content_type='')
if hasattr(o, 'text'):
2022-11-27 01:10:18 +00:00
sdata = o.text
2022-11-19 10:30:22 +00:00
else:
2022-11-27 01:10:18 +00:00
sdata = str(o.data, 'UTF-8')
2022-11-19 10:30:22 +00:00
except Exception as e:
# simplejson.errors.JSONDecodeError
# urllib3.exceptions import ConnectTimeoutError, NewConnectionError
# (urllib3.exceptions.MaxRetryError, urllib3.exceptions.ProtocolError,)
LOG.exception(f"JSON error {e}")
return []
else:
LOG.debug(f"Downloaded {surl} {len(sdata)} bytes")
2022-11-27 01:10:18 +00:00
adata = json.loads(sdata)
2022-11-19 10:30:22 +00:00
else:
odata = requests.get(surl, verify=sCAfile)
try:
adata = odata.json()
except Exception as e:
# simplejson.errors.JSONDecodeError
LOG.exception(f"JSON error {e}")
return []
else:
LOG.debug(f"Downloaded {surl} {len(adata)} relays")
sdata = repr(adata)
if oargs.relays_output:
try:
with open(oargs.relays_output, 'wt') as ofd:
ofd.write(sdata)
except Exception as e:
LOG.warn(f"Error {oargs.relays_output} {e}")
else:
LOG.debug(f"Wrote {oargs.relays_output} {len(sdata)} bytes")
lonionoo_relays = [r for r in adata["relays"] if 'fingerprint' in r.keys()]
return lonionoo_relays
def vsetup_logging(log_level, logfile='', stream=sys.stdout):
global LOG
add = True
try:
if 'COLOREDLOGS_LEVEL_STYLES' not in os.environ:
os.environ['COLOREDLOGS_LEVEL_STYLES'] = 'spam=22;debug=28;verbose=34;notice=220;warning=202;success=118,bold;error=124;critical=background=red'
# https://pypi.org/project/coloredlogs/
import coloredlogs
except ImportError:
coloredlogs = False
# stem fucks up logging
# from stem.util import log
logging.getLogger('stem').setLevel(30)
logging._defaultFormatter = logging.Formatter(datefmt='%m-%d %H:%M:%S')
logging._defaultFormatter.default_time_format = '%m-%d %H:%M:%S'
logging._defaultFormatter.default_msec_format = ''
kwargs = dict(level=log_level,
force=True,
format='%(levelname)s %(message)s')
if logfile:
add = logfile.startswith('+')
sub = logfile.startswith('-')
if add or sub:
logfile = logfile[1:]
kwargs['filename'] = logfile
if coloredlogs:
# https://pypi.org/project/coloredlogs/
aKw = dict(level=log_level,
logger=LOG,
stream=stream,
fmt='%(levelname)s %(message)s'
)
coloredlogs.install(**aKw)
if logfile:
oHandler = logging.FileHandler(logfile)
LOG.addHandler(oHandler)
LOG.info(f"CSetting log_level to {log_level} {stream}")
else:
logging.basicConfig(**kwargs)
if add and logfile:
oHandler = logging.StreamHandler(stream)
LOG.addHandler(oHandler)
LOG.info(f"SSetting log_level to {log_level!s}")
2022-11-29 12:54:36 +00:00
def vwritefinale(oargs):
global lNOT_IN_RELAYS_DB
if len(lNOT_IN_RELAYS_DB):
LOG.warn(f"{len(lNOT_IN_RELAYS_DB)} relays from stem were not in onionoo.torproject.org")
2022-11-19 10:30:22 +00:00
LOG.info(f"For info on a FP, use: https://nusenu.github.io/OrNetStats/w/relay/<FP>.html")
2022-11-29 12:54:36 +00:00
LOG.info(f"For info on relays, try: https://onionoo.torproject.org/details")
2022-11-19 10:30:22 +00:00
# https://onionoo.torproject.org/details
2022-11-27 01:10:18 +00:00
def bProcessContact(b, texclude_set, aBadContacts, iFakeContact=0):
2022-11-29 12:54:36 +00:00
global aGOOD_CONTACTS_DB
global aGOOD_CONTACTS_FPS
2022-11-27 01:10:18 +00:00
sofar = ''
fp = b['fp']
# need to skip urllib3.exceptions.MaxRetryError
if not b or 'fps' not in b or not b['fps'] or not b['url']:
LOG.warn(f"{fp} did NOT VERIFY {sofar}")
LOG.debug(f"{fp} {b} {sofar}")
# If it's giving contact info that doesnt check out
# it could be a bad exit with fake contact info
texclude_set.add(fp)
aBadContacts[fp] = b
return None
if fp not in b['fps']:
LOG.warn(f"{fp} the FP IS NOT in the list of fps {sofar}")
# assume a fp is using a bogus contact
texclude_set.add(fp)
aBadContacts[fp] = b
return False
LOG.info(f"{fp} GOOD {b['url']} {sofar}")
# add our contact info to the trustdb
2022-11-29 12:54:36 +00:00
aGOOD_CONTACTS_DB[fp] = b
2022-11-27 01:10:18 +00:00
for elt in b['fps']:
2022-11-29 12:54:36 +00:00
aGOOD_CONTACTS_FPS[elt] = b
2022-11-27 01:10:18 +00:00
return True
2022-11-29 12:54:36 +00:00
def bCheckFp(relay, sofar, lConds, texclude_set):
global aGOOD_CONTACTS_DB
global aGOOD_CONTACTS_FPS
global lNOT_IN_RELAYS_DB
2022-11-27 01:10:18 +00:00
if not is_valid_fingerprint(relay.fingerprint):
LOG.warn('Invalid Fingerprint: %s' % relay.fingerprint)
return None
fp = relay.fingerprint
if aRELAYS_DB and fp not in aRELAYS_DB.keys():
LOG.warn(f"{fp} not in aRELAYS_DB")
2022-11-29 12:54:36 +00:00
lNOT_IN_RELAYS_DB += [fp]
2022-11-27 01:10:18 +00:00
if not relay.exit_policy.is_exiting_allowed():
2022-11-29 12:54:36 +00:00
if sEXCLUDE_EXIT_GROUP == sEXCLUDE_EXIT_GROUP:
2022-11-27 01:10:18 +00:00
pass # LOG.debug(f"{fp} not an exit {sofar}")
else:
pass # LOG.warn(f"{fp} not an exit {sofar}")
# return None
# great contact had good fps and we are in them
2022-11-29 12:54:36 +00:00
if fp in aGOOD_CONTACTS_FPS.keys():
2022-11-27 01:10:18 +00:00
# a cached entry
return None
if type(relay.contact) == bytes:
# dunno
relay.contact = str(relay.contact, 'UTF-8')
# fail if the contact is empty
if ('Empty' in lConds and not relay.contact):
LOG.info(f"{fp} skipping empty contact - Empty {sofar}")
texclude_set.add(fp)
return None
contact = sCleanEmail(relay.contact)
# fail if the contact has no email - unreliable
2022-11-29 12:54:36 +00:00
if 'NoEmail' in lConds and relay.contact and \
('@' not in contact):
2022-11-27 01:10:18 +00:00
LOG.info(f"{fp} skipping contact - NoEmail {contact} {sofar}")
LOG.debug(f"{fp} {relay.contact} {sofar}")
texclude_set.add(fp)
return None
# fail if the contact does not pass
if ('NotGood' in lConds and relay.contact and
('ciissversion:' not in relay.contact)):
LOG.info(f"{fp} skipping no ciissversion in contact {sofar}")
LOG.debug(f"{fp} {relay.contact} {sofar}")
texclude_set.add(fp)
return None
# fail if the contact does not have url: to pass
if relay.contact and 'url' not in relay.contact:
LOG.info(f"{fp} skipping unfetchable contact - no url {sofar}")
LOG.debug(f"{fp} {relay.contact} {sofar}")
if ('NotGood' in lConds): texclude_set.add(fp)
return None
return True
def oMainPreamble(lArgs):
2022-11-29 12:54:36 +00:00
global aGOOD_CONTACTS_DB
global aGOOD_CONTACTS_FPS
2022-11-27 01:10:18 +00:00
2022-11-07 05:40:00 +00:00
parser = oMainArgparser()
2022-11-17 08:58:45 +00:00
oargs = parser.parse_args(lArgs)
2022-11-07 05:40:00 +00:00
2022-11-17 08:58:45 +00:00
vsetup_logging(oargs.log_level)
2022-11-08 14:15:05 +00:00
if bAreWeConnected() is False:
raise SystemExit("we are not connected")
2022-11-17 08:58:45 +00:00
sFile = oargs.torrc
2022-11-08 14:15:05 +00:00
if sFile and os.path.exists(sFile):
2022-11-17 08:58:45 +00:00
icheck_torrc(sFile, oargs)
2022-11-07 11:38:22 +00:00
2022-11-17 08:58:45 +00:00
sFile = oargs.good_contacts
2022-11-27 01:10:18 +00:00
if sFile and os.path.exists(sFile):
2022-11-08 14:15:05 +00:00
try:
with open(sFile, 'rt') as oFd:
2022-11-29 12:54:36 +00:00
aGOOD_CONTACTS_DB = safe_load(oFd)
LOG.info(f"{len(aGOOD_CONTACTS_DB.keys())} trusted contacts from {sFile}")
2022-11-09 09:30:43 +00:00
# reverse lookup of fps to contacts
# but...
2022-11-29 12:54:36 +00:00
for (k, v,) in aGOOD_CONTACTS_DB.items():
2022-11-13 04:37:30 +00:00
if 'modified' not in v.keys():
2022-11-13 20:13:56 +00:00
v['modified'] = int(time.time())
2022-11-29 12:54:36 +00:00
aGOOD_CONTACTS_FPS[k] = v
if 'fps' in aGOOD_CONTACTS_DB[k].keys():
for fp in aGOOD_CONTACTS_DB[k]['fps']:
if fp in aGOOD_CONTACTS_FPS:
2022-11-13 04:37:30 +00:00
continue
2022-11-29 12:54:36 +00:00
aGOOD_CONTACTS_FPS[fp] = v
LOG.info(f"{len(aGOOD_CONTACTS_FPS.keys())} good relays from {sFile}")
2022-11-14 11:59:33 +00:00
2022-11-09 09:30:43 +00:00
except Exception as e:
2022-11-09 18:43:54 +00:00
LOG.exception(f"Error reading YAML TrustDB {sFile} {e}")
2022-11-07 11:38:22 +00:00
2022-11-27 01:10:18 +00:00
return oargs
def oStemController(oargs):
if os.path.exists(oargs.proxy_ctl):
controller = oGetStemController(log_level=oargs.log_level, sock_or_pair=oargs.proxy_ctl)
else:
port =int(oargs.proxy_ctl)
controller = oGetStemController(log_level=oargs.log_level, sock_or_pair=port)
vwait_for_controller(controller, oargs.wait_boot)
2022-11-08 14:15:05 +00:00
2022-11-07 05:40:00 +00:00
elt = controller.get_conf('UseMicrodescriptors')
2022-11-17 08:58:45 +00:00
if elt != '0':
2022-11-08 14:15:05 +00:00
LOG.error('"UseMicrodescriptors 0" is required in your /etc/tor/torrc. Exiting.')
2022-11-07 11:38:22 +00:00
controller.set_conf('UseMicrodescriptors', 0)
# does it work dynamically?
2022-11-08 14:15:05 +00:00
return 2
2022-11-29 12:54:36 +00:00
elt = controller.get_conf(sEXCLUDE_EXIT_GROUP)
2022-11-07 05:40:00 +00:00
if elt and elt != '{??}':
2022-11-29 12:54:36 +00:00
LOG.warn(f"{sEXCLUDE_EXIT_GROUP} is in use already")
2022-11-07 05:40:00 +00:00
2022-11-27 01:10:18 +00:00
return controller
def tWhitelistSet(oargs, controller):
twhitelist_set = set()
2022-11-17 08:58:45 +00:00
twhitelist_set.update(set(lYamlGoodNodes(oargs.good_nodes)))
2022-11-27 01:10:18 +00:00
LOG.info(f"lYamlGoodNodes {len(twhitelist_set)} EntryNodes from {oargs.good_nodes}")
2022-11-08 14:15:05 +00:00
2022-11-13 04:37:30 +00:00
t = set()
2022-11-29 14:52:48 +00:00
if 'IntroductionPoints' in aGOOD_NODES[sGOOD_ROOT]['Relays'].keys():
t = set(aGOOD_NODES[sGOOD_ROOT]['Relays']['IntroductionPoints'])
if oargs.hs_dir and os.path.exists(oargs.hs_dir):
for (dirpath, dirnames, filenames,) in os.walk(oargs.hs_dir):
for f in filenames:
if f != 'hostname': continue
with open(os.path.join(dirpath, f), 'rt') as oFd:
son = oFd.read()
t.update(son)
LOG.info(f"Added {son} to the list for Introduction Points")
2022-11-29 12:54:36 +00:00
if oargs.notice_log and os.path.exists(oargs.notice_log):
tmp = tempfile.mktemp()
i = os.system(f"grep 'Every introduction point for service' {oargs.notice_log} |sed -e 's/.* service //' -e 's/ is .*//'|sort -u |sed -e '/ /d' > {tmp}")
if i:
with open(tmp, 'rt') as oFd:
2022-11-29 14:52:48 +00:00
tnew = {elt.strip() for elt in oFd.readlines()}
t.update(tnew)
2022-11-29 12:54:36 +00:00
LOG.info(f"Whitelist {len(lnew)} services from {oargs.notice_log}")
os.remove(tmp)
2022-11-16 21:00:16 +00:00
w = set()
2022-11-29 14:52:48 +00:00
if sGOOD_ROOT in aGOOD_NODES and 'Services' in aGOOD_NODES[sGOOD_ROOT].keys():
w = set(aGOOD_NODES[sGOOD_ROOT]['Services'])
2022-11-19 10:30:22 +00:00
if len(w) > 0:
2022-11-29 12:54:36 +00:00
LOG.info(f"Whitelist {len(w)} relays from {sGOOD_ROOT}/Services")
if oargs.notice_log and os.path.exists(oargs.notice_log):
tmp = tempfile.mktemp()
i = os.system(f"grep 'Wanted to contact directory mirror \$' /var/lib/tor/.SelekTOR/3xx/cache/9050/notice.log|sed -e 's/.* \$//' -e 's/[~ ].*//'|sort -u > {tmp}")
if i:
with open(tmp, 'rt') as oFd:
lnew = oFd.readlines()
w.update(set(lnew))
LOG.info(f"Whitelist {len(lnew)} relays from {oargs.notice_log}")
os.remove(tmp)
twhitelist_set.update(w)
2022-11-19 10:30:22 +00:00
w = set()
2022-11-29 14:52:48 +00:00
if 'Onions' in aGOOD_NODES[sGOOD_ROOT].keys():
2022-11-16 21:00:16 +00:00
# Provides the descriptor for a hidden service. The **address** is the
2022-11-17 08:58:45 +00:00
# '.onion' address of the hidden service
2022-11-29 14:52:48 +00:00
w = set(aGOOD_NODES[sGOOD_ROOT]['Onions'])
2022-11-17 08:58:45 +00:00
if oargs.white_onions:
w.update(oargs.white_onions.split(','))
if oargs.points_timeout > 0:
2022-11-13 20:13:56 +00:00
LOG.info(f"{len(w)} services will be checked from IntroductionPoints")
2022-11-17 08:58:45 +00:00
t.update(lIntroductionPoints(controller, w, itimeout=oargs.points_timeout))
2022-11-13 04:37:30 +00:00
if len(t) > 0:
2022-11-19 10:30:22 +00:00
LOG.info(f"IntroductionPoints {len(t)} relays from {len(w)} IPs for onions")
2022-11-13 04:37:30 +00:00
twhitelist_set.update(t)
2022-11-27 01:10:18 +00:00
return twhitelist_set
def tExcludeSet(oargs):
2022-11-13 04:37:30 +00:00
texclude_set = set()
2022-11-29 12:54:36 +00:00
sections = {'BadExit'}
2022-11-17 08:58:45 +00:00
if oargs.bad_nodes and os.path.exists(oargs.bad_nodes):
2022-11-29 12:54:36 +00:00
if oargs.bad_sections:
sections.update(oargs.bad_sections.split(','))
texclude_set = set(lYamlBadNodes(oargs.bad_nodes,
tWanted=sections,
section=sEXCLUDE_EXIT_GROUP))
LOG.info(f"Preloaded {len(texclude_set)} bad fps")
2022-11-07 05:40:00 +00:00
2022-11-27 01:10:18 +00:00
return texclude_set
# async
def iMain(lArgs):
2022-11-29 12:54:36 +00:00
global aGOOD_CONTACTS_DB
global aGOOD_CONTACTS_FPS
global aBAD_CONTACTS_DB
global aBAD_NODES
2022-11-29 14:52:48 +00:00
global aGOOD_NODES
2022-11-27 01:10:18 +00:00
global lKNOWN_NODNS
global aRELAYS_DB
global aRELAYS_DB_INDEX
global tBAD_URLS
2022-11-29 12:54:36 +00:00
global lNOT_IN_RELAYS_DB
2022-11-27 01:10:18 +00:00
oargs = oMainPreamble(lArgs)
controller = oStemController(oargs)
twhitelist_set = tWhitelistSet(oargs, controller)
texclude_set = tExcludeSet(oargs)
2022-11-29 12:54:36 +00:00
ttrust_db_index = aGOOD_CONTACTS_FPS.keys()
2022-11-08 14:15:05 +00:00
iFakeContact = 0
2022-11-13 20:13:56 +00:00
iTotalContacts = 0
2022-11-08 14:15:05 +00:00
aBadContacts = {}
2022-11-29 12:54:36 +00:00
lNOT_IN_RELAYS_DB = []
2022-11-09 05:43:26 +00:00
iR = 0
2022-11-09 18:43:54 +00:00
relays = controller.get_server_descriptors()
2022-11-27 01:10:18 +00:00
lqueue = []
socksu = f"socks5://{oargs.proxy_host}:{oargs.proxy_port}"
2022-11-29 12:54:36 +00:00
if oargs.saved_only:
relays = []
2022-11-07 05:40:00 +00:00
for relay in relays:
2022-11-09 05:43:26 +00:00
iR += 1
2022-11-27 01:10:18 +00:00
fp = relay.fingerprint = relay.fingerprint.upper()
2022-11-14 11:59:33 +00:00
2022-11-29 12:54:36 +00:00
sofar = f"G:{len(aGOOD_CONTACTS_DB.keys())} F:{iFakeContact} BF:{len(texclude_set)} GF:{len(ttrust_db_index)} TC:{iTotalContacts} #{iR}"
2022-11-19 10:30:22 +00:00
2022-11-27 01:10:18 +00:00
lConds = oargs.bad_on.split(',')
2022-11-29 12:54:36 +00:00
r = bCheckFp(relay, sofar, lConds, texclude_set)
2022-11-27 01:10:18 +00:00
if r is not True: continue
2022-11-19 10:30:22 +00:00
# if it has a ciissversion in contact we count it in total
iTotalContacts += 1
2022-11-27 01:10:18 +00:00
2022-11-19 10:30:22 +00:00
# only proceed if 'NotGood' not in lConds:
2022-11-29 12:54:36 +00:00
if 'NotGood' not in lConds:
continue
2022-11-19 10:30:22 +00:00
# fail if the contact does not have url: to pass
2022-11-27 01:10:18 +00:00
a = aParseContact(relay.contact, fp)
2022-11-19 10:30:22 +00:00
if not a:
2022-11-29 12:54:36 +00:00
LOG.warn(f"{fp} BC contact did not parse {sofar}")
2022-11-14 11:59:33 +00:00
texclude_set.add(fp)
2022-11-29 12:54:36 +00:00
aBAD_CONTACTS_DB[fp] = a
2022-11-13 04:37:30 +00:00
continue
2022-11-19 10:30:22 +00:00
if 'url' in a and a['url']:
# fail if the contact uses a url we already know is bad
if a['url'] in tBAD_URLS:
2022-11-27 01:10:18 +00:00
LOG.info(f"{fp} skipping in tBAD_URLS {a['url']} {sofar}")
LOG.debug(f"{fp} {a} {sofar}")
texclude_set.add(fp)
2022-11-19 10:30:22 +00:00
continue
domain = a['url'].replace('https://', '').replace('http://', '')
# fail if the contact uses a domain we already know does not resolve
if domain in lKNOWN_NODNS:
# The fp is using a contact with a URL we know is bogus
2022-11-29 12:54:36 +00:00
LOG.info(f"{fp} BC skipping in lKNOWN_NODNS {a} {sofar}")
2022-11-27 01:10:18 +00:00
LOG.debug(f"{fp} {relay} {sofar}")
texclude_set.add(fp)
2022-11-29 12:54:36 +00:00
aBAD_CONTACTS_DB[fp] = a
2022-11-13 04:37:30 +00:00
continue
2022-11-27 01:10:18 +00:00
# drop through
2022-11-29 12:54:36 +00:00
if 'proof' in a and a['proof'] in ['uri-rsa', 'dns-rsa']:
2022-11-27 01:10:18 +00:00
if domain in aDOMAIN_FPS.keys(): continue
if httpx:
2022-11-29 12:54:36 +00:00
a['fp'] = fp
2022-11-27 01:10:18 +00:00
lqueue.append(asyncio.create_task(
aVerifyContact(a=a,
fp=fp,
https_cafile=oargs.https_cafile,
timeout=oargs.timeout,
host=oargs.proxy_host,
port=oargs.proxy_port,
oargs=oargs)))
else:
b = aVerifyContact(a=a,
fp=fp,
https_cafile=oargs.https_cafile,
timeout=oargs.timeout,
host=oargs.proxy_host,
port=oargs.proxy_port,
oargs=oargs)
r = bProcessContact(b, texclude_set, aBadContacts, iFakeContact)
if r is False:
iFakeContact += 1
if httpx:
# for b in asyncio.as_completed(lqueue):
for b in lqueue:
# r = await b
r = b
r = bProcessContact(r, texclude_set, aBadContacts, iFakeContact)
if r is False:
2022-11-08 14:15:05 +00:00
iFakeContact += 1
2022-11-27 01:10:18 +00:00
elif r is True:
# iGoodContact += 1
pass
2022-11-13 20:13:56 +00:00
LOG.info(f"Filtered {len(twhitelist_set)} whitelisted relays")
2022-11-13 04:37:30 +00:00
texclude_set = texclude_set.difference(twhitelist_set)
2022-11-29 12:54:36 +00:00
LOG.info(f"{len(list(aGOOD_CONTACTS_DB.keys()))} good contacts out of {iTotalContacts}")
2022-11-08 14:15:05 +00:00
2022-11-17 08:58:45 +00:00
if oargs.torrc_output and texclude_set:
with open(oargs.torrc_output, 'wt') as oFTorrc:
2022-11-29 12:54:36 +00:00
oFTorrc.write(f"{sEXCLUDE_EXIT_GROUP} {','.join(texclude_set)}\n")
oFTorrc.write(f"{sINCLUDE_EXIT_KEY} {','.join(aGOOD_CONTACTS_FPS.keys())}\n")
2022-11-29 14:52:48 +00:00
oFTorrc.write(f"{sINCLUDE_GUARD_KEY} {','.join(aGOOD_NODES[sGOOD_ROOT]['EntryNodes'])}\n")
2022-11-17 08:58:45 +00:00
LOG.info(f"Wrote tor configuration to {oargs.torrc_output}")
2022-11-08 14:15:05 +00:00
oFTorrc.close()
2022-11-17 08:58:45 +00:00
if oargs.bad_contacts and aBadContacts:
2022-11-08 14:15:05 +00:00
# for later analysis
2022-11-17 08:58:45 +00:00
with open(oargs.bad_contacts, 'wt') as oFYaml:
2022-11-16 22:15:00 +00:00
yaml.dump(aBadContacts, oFYaml)
oFYaml.close()
2022-11-29 12:54:36 +00:00
if oargs.good_contacts != '' and aGOOD_CONTACTS_DB:
2022-11-27 01:10:18 +00:00
vwrite_good_contacts(oargs)
2022-11-29 12:54:36 +00:00
aBAD_NODES[oBAD_ROOT][sEXCLUDE_EXIT_GROUP]['BadExit'] = list(texclude_set)
aBAD_NODES[oBAD_ROOT][sEXCLUDE_DOMAINS] = lKNOWN_NODNS
if oargs.bad_nodes:
2022-11-29 14:52:48 +00:00
stag = sEXCLUDE_EXIT_GROUP + '/BadExit'
vwrite_badnodes(oargs, aBAD_NODES, str(len(texclude_set)), stag)
2022-11-14 11:59:33 +00:00
2022-11-29 14:52:48 +00:00
aGOOD_NODES['GoodNodes']['Relays']['ExitNodes'] = list(aGOOD_CONTACTS_FPS.keys())
2022-11-27 01:10:18 +00:00
# EntryNodes are readony
2022-11-29 12:54:36 +00:00
if oargs.good_nodes:
2022-11-29 14:52:48 +00:00
vwrite_goodnodes(oargs, aGOOD_NODES, len(aGOOD_CONTACTS_FPS.keys()))
2022-11-17 08:58:45 +00:00
2022-11-29 12:54:36 +00:00
vwritefinale(oargs)
2022-11-19 10:30:22 +00:00
2022-11-08 14:15:05 +00:00
retval = 0
try:
logging.getLogger('stem').setLevel(30)
2022-11-27 01:10:18 +00:00
if texclude_set:
try:
2022-11-29 12:54:36 +00:00
LOG.info(f"controller {sEXCLUDE_EXIT_GROUP} {len(texclude_set)} net bad relays")
controller.set_conf(sEXCLUDE_EXIT_GROUP, list(texclude_set))
2022-11-27 01:10:18 +00:00
except (Exception, stem.InvalidRequest, stem.SocketClosed,) as e: # noqa
2022-11-29 12:54:36 +00:00
LOG.error(f"Failed setting {sEXCLUDE_EXIT_GROUP} bad exit relays in Tor {e}")
2022-11-27 01:10:18 +00:00
LOG.debug(repr(texclude_set))
retval += 1
2022-11-29 12:54:36 +00:00
if aGOOD_CONTACTS_FPS.keys():
l = [elt for elt in aGOOD_CONTACTS_FPS.keys() if len (elt) == 40]
2022-11-27 01:10:18 +00:00
try:
2022-11-29 12:54:36 +00:00
LOG.info(f"controller {sINCLUDE_EXIT_KEY} {len(l)} good relays")
2022-11-27 01:10:18 +00:00
controller.set_conf(sINCLUDE_EXIT_KEY, l)
except (Exception, stem.InvalidRequest, stem.SocketClosed) as e: # noqa
LOG.error(f"Failed setting {sINCLUDE_EXIT_KEY} good exit nodes in Tor {e}")
LOG.debug(repr(l))
retval += 1
2022-11-29 14:52:48 +00:00
if 'EntryNodes' in aGOOD_NODES[sGOOD_ROOT].keys():
2022-11-27 01:10:18 +00:00
try:
2022-11-29 14:52:48 +00:00
LOG.info(f"{sINCLUDE_GUARD_KEY} {len(aGOOD_NODES[sGOOD_ROOT]['EntryNodes'])} guard nodes")
2022-11-13 04:37:30 +00:00
# FixMe for now override StrictNodes it may be unusable otherwise
controller.set_conf(sINCLUDE_GUARD_KEY,
2022-11-29 14:52:48 +00:00
aGOOD_NODES[sGOOD_ROOT]['EntryNodes'])
2022-11-27 01:10:18 +00:00
except (Exception, stem.InvalidRequest, stem.SocketClosed,) as e: # noqa
LOG.error(f"Failed setting {sINCLUDE_GUARD_KEY} guard nodes in Tor {e}")
2022-11-29 14:52:48 +00:00
LOG.debug(repr(list(aGOOD_NODES[sGOOD_ROOT]['EntryNodes'])))
2022-11-27 01:10:18 +00:00
retval += 1
cur = controller.get_conf('StrictNodes')
if oargs.strict_nodes and int(cur) != oargs.strict_nodes:
controller.set_conf('StrictNodes', oargs.strict_nodes)
2022-11-29 12:54:36 +00:00
cur = controller.get_conf('StrictNodes')
if int(cur) != oargs.strict_nodes:
LOG.warn(f"OVERRIDING StrictNodes NOT {oargs.strict_nodes}")
else:
LOG.info(f"OVERRODE StrictNodes to {oargs.strict_nodes}")
2022-11-27 01:10:18 +00:00
else:
LOG.info(f"StrictNodes is set to {cur}")
2022-11-08 14:15:05 +00:00
except KeyboardInterrupt:
return 0
except Exception as e:
LOG.exception(str(e))
retval = 2
finally:
# wierd we are getting stem errors during the final return
# with a traceback that doesnt correspond to any real flow
# File "/usr/lib/python3.9/site-packages/stem/control.py", line 2474, in set_conf
# self.set_options({param: value}, False)
logging.getLogger('stem').setLevel(40)
try:
for elt in controller._event_listeners:
controller.remove_event_listener(elt)
controller.close()
except Exception as e:
LOG.warn(str(e))
2022-11-17 08:58:45 +00:00
2022-11-16 22:15:00 +00:00
return retval
2022-11-07 05:40:00 +00:00
if __name__ == '__main__':
try:
2022-11-27 01:10:18 +00:00
# i = asyncio.run(iMain(sys.argv[1:]))
2022-11-07 05:40:00 +00:00
i = iMain(sys.argv[1:])
2022-11-07 11:38:22 +00:00
except IncorrectPassword as e:
LOG.error(e)
i = 1
2022-11-08 14:15:05 +00:00
except KeyboardInterrupt:
i = 0
2022-11-07 05:40:00 +00:00
except Exception as e:
LOG.exception(e)
2022-11-08 14:15:05 +00:00
i = 2
2022-11-07 05:40:00 +00:00
sys.exit(i)