Related to the recent Haystack hubbub, here's a basic overview of censorship resistance tools, of which Haystack was an example (unfortunately a fairly broken one).
For the purposes of these notes, by 'censorship resistance tools', I'll be referring to ones for browsing the web from inside of countrywide firewalls which are trying to limit access, such as Freegate, Ultrasurf, and the like. Obviously there are other forms of censorship and resistance to it, but that's what's being discussed for now.
The usage pattern for censorship resistance tools goes something like this:
- system sends information about proxies to users
- users use proxies to browse the web freely
- firewall operator finds out IPs of proxies and blocks them by IP
- go back to step 1
It's an ongoing cat and mouse game involving cycling through a lot of IPs and a lot of careful secrecy.
An attacker might also, instead of outright blocking an IP, artificially create a very high packet loss rate going to it, which might make users conclude that the anti-censorship system doesn't work very well and give up on it. That could be countered by trying to guess when there's an artificially high packet loss rate, but that's potentially an insidious game - the attacker might, for example, determine where the machines developers use for testing are, and not artificially drop packets to those.
There's considerable concern about the threat model of the censor finding out which users are using the proxies and doing bad things to them. I'll just cut to the chase on that issue - the resistance to attacks of that form is inherently weak. The censor can simply record the destinations of all outgoing connections, and retroactively correlate them to discovered proxies, unveiling the IP of a user. This is a vicious attack which can't be completely eliminated. Possession of the tool might also be incriminating.
High level methods of avoiding detection include:
- Have lots of cover traffic - that is, lots of users, so attacking them all is impractical. This is probably the ultimate solution, because a tool which doesn't have enough users to provide cover traffic isn't successful, and a successful tool implicitly provides lots of cover traffic.
- Have user use shared/ephemeral IPs. This is a low tech approach having little to do with the protocol.
- Use no software, that is, http/https proxies. This makes the user have no recurring evidence, but can expose what the user is doing to snooping.
- Use ephemeral or easy to dispose of software. This is a good idea, but the techiques for doing it are tricky or rely on physical security.
- Run proxies on web sites running other services which are also used by users within the target area. This is a great approach, but requires cooperation of a web site which has the willingness to be (or confidence it won't be) blocked.
- Use actual skype connections. This is an interesting approach which has the benefit of lots of cover traffic, but suffers from limitations on the bandwidth skype intermediaries will provide, and could be attacked by an attacker running lots of high quality skype nodes and noticing the very suspicious traffic.
- Dial down the level of paranoia. In the end a certain amount of this may be necessary.
Censors have multiple ways of finding IP addresses which are used by the anti-censorship system:
- Use the same methods as the software. This is a very insidious approach, putting the anti-censorship system in a position of trying to simultaneously publish new IPs and keep their distribution limited.
- Correlation attacks on existing known IPs. This is also a very insidious attack - the attacker simply takes IPs which are known to use the anti-censorship tool, and looks for otherwise unpopular IPs which a lot of those are connecting to.
- Probing - an attacker can connect to suspected proxies and try to get them to give themselves away by doing a handshake. Depending on the type of proxy connection used, this can be very effective, sometimes in combination with reverse DNS.
- Trick proxy users into hitting a web site and observe what IPs the connections come from, observing the IPs of the proxies directly.
- Deep packet inspection and traffic pattern analysis, including packet sizes, connection number and duration, etc. These can be extremely effective, but can be extremely expensive for an anti-anti-censor to set up. Connection number and duration are probably the most telling pieces of information, and the cheapest to implement, as well as the easiest for the anti-censor to manipulate.
There are several ways for an anti-censor to make it hard to find their IPs:
- Use lots of IPs. If each user can be given their own dedicated IP then the system is extremely hard to attack. Problem is, this approach requires procument of lots of IPs, which isn't easy.
- Limit how many users info is given to. This is a good idea, but difficult to do.
- Encrypt info with not widely circulated keys. This moves the problem to key distribution and management, which is a good idea.
- Distribute fake IPs including stuff the censor would regret blocking. I think this is kind of fun.
- Have clients only connect to one IP. This is a very good idea! Should be followed as closely as possible.
- Make traffic go through more than one hop, masking the IPs of proxies to connections on the outgoing side. While clearly a good idea, this doubles the bandwidth used, which kind of sucks.
- Rely on deep packet inspection being hard. Less unreasonable than you might imagine - deep packet inspection systems are very expensive and take a while to upgrade, and intelligence on what the deep packet inspection can do is sometimes available.
- Steganographically encode connections to proxies - this obviously must be done, although it isn't obvious what the best approach is.
There are several things proxy connections could be made to look like -
- HTTP - while there's plenty of cover traffic for HTTP, deep packet inspection and probing can probably be very effective in recognizing patterns in it, making it not very appealing for stego connections
- SSL/TLS - there's a decent amount of cover traffic for TLS connections in the form of HTTPS, and using the HTTPS port is probably a good approach, especially since the traffic patterns are going to match http anyway, since that's what it is. There's some concern that man in the middle attacks might be launched, although those are difficult, and an attacker might get suspicious if reverse DNS doesn't return believable information. Still, this may be the best option, and is certainly the simplest to implement.
- BitTorrent - BitTorrent has lots of cover traffic, and the obfuscated version of the protocol looks fairly generic, although its traffic patterns are very distinctive and wouldn't be closely matched by anti-censorship web browsing.
- utp - utp is a udp-based TCP-alike originally designed for BitTorrent. It has the advantage that some deep packet inspection systems just plain don't support UDP, and it's easy to use as a swap-in replacement for TCP. It has some of the same cover traffic problems as regular BitTorrent.
- SSH - while tunneling over SSH is not uncommon, making using SSH connections no more suspicious than having long-lived high-throughput SSH connections is to begin with, that's already a high level of suspiciousness, so this probably isn't a great approach.
- skype - skype traffic has good cover traffic, but is a very poor match in terms of usage patterns.
- noise - a TCP connection which has just plain garbage going over it is a surprisingly reasonable approach. Lots of weird miscellaneous things on the internet are hard to classify, and obfuscated BitTorrent provides a decent amount of cover.
There are several methods a censorship resistance system can use to get IP addresses out -
- offline - this is the most secure way, but it's very slow and expensive
- spam cannon - a spam blast can be sent out containing addresses of proxies. This works but is moderately slow and moderately expensive. It's also potentially very easy to intercept.
- to existing users - client software can be sent IPs of failback proxies when it makes a proxy connection. This works and is fast, but has the problem that an attacker can run client software and use it to find proxies as well.
- via web stego - this technique hasn't been used yet, but IPs could be encoded steganographically in real web traffic. Given the tremendous popularity of censorship resistance tools in the west, it might be possible to enlist the help of lots of web sites, and make it essentially impossible to filter them all out. I'm working on technology for this.