Analyzing China's Blocking of Unpublished Tor Bridges
Note that our packet captures are not included for privacy reasons. However, direct exports of filters applied to these captures are supplied in CSVs.
The complete list of files is contained in this directory, or available in a compressed ZIP file.
We include descriptions of each file:
files.zip
Contains all included directories and files, compressed for download.
Code
Code/ping.py
This code creates processes to ping each IP found in the CSV containing public relay IPs. It prints each result. You can export this to a file through a pipe.
Code/plots.r
Plotting code used to create all plots shown in our paper, as well as some unused ones.
Data
Data/analysis-only-SYN.csv
Formatted export of packet capture. It contains TCP packets, filtered to remove packets sent from our Chinese client to the relay. It also removes noise, or any other packets not sent from China to the US. So, it only contains SYN packets sent from Chinese scanners.
Data/analysis-with-streams.csv
Similar to the previous one, except that packets are not limited to those with the SYN flag, and the TCP stream index is included in the data for each packet.
Data/userstats-bridge-combined-cn-2012-01-01-2018-05-28.csv
Contains the number of Chinese Tor bridge users per protocol (default, meek, and obfs4) for each day from 01/01/2012 to 05/28/2018.
Data/userstats-relay-country-2011-01-01-cn-2018-05-29-off.csv
Contains the total number of Chinese Tor relay users per day from 01/01/2011 to 05/29/2018.
Data/PingResults.txt
Contains results from Code/ping.py
, showing whether each pinged relay IP was up or down.
Plots
All plots are generated using the exported capture results. You can therefore assume “packets” to mean packets contained in those results.
Plots/cdf_ttl.pdf
CDF of TTLs for packets.
Plots/china_tor_users.pdf
Double line plot showing the number of Tor users over time per meek and obfs4, through daily peaks.
Plots/ipid_ttl_time.pdf
A scatter plot with two axes for showing each packet’s IPID and TTL. This was to determine any IPID patterns.
Plots/probe_map.pdf
A map of China using rworldmap, meant to display the wide variety in geo-locations of scanner IPs.
Plots/scans_per_ip.pdf
Bar plot using geom_histogram with the stream-limited dataset described in Code/plots.r
, to show the number of scans from each IP.
Plots/syn_count_dist.pdf
Density plot for the number of SYN packets seen from each IP.
Plots/syn_times.pdf
An experimental plot for showing which times SYN packets were received / not received. Each vertical blue line represents a received packet.
Plots/syn_times_2.pdf
A more readable version of the previous plot for showing times that SYN packets were received. This shows the number of packets received each hour through a line plot.
Plots/ttl_by_mss.pdf
TTLs for packets, categorized by MSS value in a bar plot.
Resources
Resources/GeoLite2-City.mmdb
MaxMind GeoLite2 City database for geolocation.
Resources/PUBLIC_RELAYS_STRIPPED-04-23-2018.csv
A list of all Tor relay IPs on 04/23/2018, obtained through scraping the public Tor consensus.
Resources/tcis_run
A compiled binary of tcis.