Building an Anti-Pollution DNS
DNS service is an important basic service of the Internet, but its importance is often underestimated. For example, in August 2013, the .cn root domain server was attacked by DDoS, causing .cn domains to be inaccessible; on January 21, 2014, the root domain server was polluted by a famous firewall, causing all international domains to be inaccessible. Many internationally renowned websites cannot be accessed in mainland China, partly because they have suffered DNS pollution, that is, the wrong IP address is returned for the domain name.
Building an anti-pollution DNS is not as simple as using a VPN to resolve all domain names. There are mainly two problems:
Why distinguish between domestic and foreign resolution
Some large websites often get Telecom IP from Telecom queries, Education Network IP from Education Network queries, and foreign IP from foreign queries. The DNS server will give different replies according to the source IP of the DNS resolution request. The most famous authoritative DNS server software, bind9, has the “view” function, which allows DNS requesters with different IPs to see different views.
Large websites generally use Content Delivery Networks (CDN) to distribute static content to servers all over the world; there are also enterprise-level Virtual Private Networks (VPN), even if the content cannot be statically distributed, it can use distributed databases or quickly route to central nodes for processing. Regardless of how the website is implemented internally, for users, accessing the nearest IP on the network is generally the best.
If you use a foreign VPN to resolve all domain names, whether you directly use the recursive DNS resolution service provided by the VPN operator, or build your own bind9 and other recursive DNS servers, the authoritative DNS server of the website sees the request source is a foreign IP, either it will reply with a foreign IP, or it will reply with a domestic IP that the website thinks is the fastest to access abroad. For example, querying mirrors.ustc.edu.cn from abroad will get China Mobile’s 202.141.176.110. But if I am an Education Network user, accessing this IP may be slow, because the interconnection between domestic operators is known to be poor.
How to prevent DNS pollution and resolve domestic websites to the IP of your own operator? We observe the process of recursively querying example.com from the root server (schematic):
- The root server tells that .com is managed by x.gtld-servers.net, this step does not have regional resolution
- x.gtld-servers.net tells that example.com is managed by dns.example.net, this step does not have regional resolution
- Look up the IP of dns.example.net, omitted here
- dns.example.net tells the IP address of example.com, this step has regional resolution
That is to say, if example.com is domestic, we only need to use a domestic IP when sending a DNS query to dns.example.net. Generally speaking, the DNS servers of domestic websites are also in China, so as long as all DNS requests to domestic IPs go through the operator’s line instead of the VPN, you can resolve domestic websites to your operator’s IP. Of course, for domestic websites that use CDN services provided by international CDN providers (such as Amazon, Edgecast), the above assumptions may not hold, because the DNS servers of these international CDN providers are generally abroad. I haven’t thought of a good way to solve this problem.
Will this practice cause some domain names to still be polluted? I don’t think so. If a domain’s DNS is in China, there is no need to bother a certain firewall, just order to stop the resolution of this domain. Therefore, the authoritative DNS servers of polluted domain names should all be abroad. Since only domestic IPs go through the operator’s line, these DNS queries should all go through the VPN and should not be polluted.
Implementation of domestic and foreign resolution distinction
Let domestic IPs go through the operator’s line, and other IPs go through the VPN, just modify the routing table. To generate a CIDR format domestic IP list, you can download the publicly distributed mainland China IP address allocation data from APNIC, and then perform simple string processing:
1 | #!/bin/bash |
Setting up routing is also a very simple matter:
1 | #!/bin/bash |
If your network does not support IPv6 at all, just ignore the above IPv6 part of the configuration. If your operator’s network supports IPv6 and the VPN does not, you must actively block foreign IPv6, otherwise the domain names resolved through IPv6 may be polluted. By the way, if you don’t want to use IPv6 to resolve domain names, you can add -4 to the command line parameters of bind9. In Debian systems, you can modify /etc/default/bind9, change OPTIONS=”-u bind” to OPTIONS=”-u bind -4”.
It is recommended to use bind9 to build a recursive DNS server. Please note to modify /etc/bind/named.conf.options (this is the path under Debian), comment out all the forwarders brackets, do not use other people’s recursive DNS services, completely rely on yourself to resolve from the root level by level. The default configuration of bind9 does not allow the external network to use this recursive DNS server. If you use it yourself, it’s good. If you want to use it for others, you need to modify the settings of allow-query and allow-recursion.
Improving VPN Stability with Load Balancing
A VPN connection may experience intermittent packet loss and could be blocked at any time. For example, the anti-pollution DNS I built using the method mentioned above shows the following DNS request response times within a day (all queries are for foreign domain names, and DNS cache was not used during testing). 2.3% of DNS query requests failed.
As can be seen from the above figure, DNS query failures (due to VPN packet loss or intermittent blocking) are evenly distributed throughout the day, and can almost be considered as random events.
When we build a web server, we all know that in order to improve availability, we set up multiple backend servers, and the frontend is just a proxy for load balancing. Why can’t DNS do this?
I tried to set up three virtual machines, the first virtual machine binds to the public IP and provides DNS resolution services; the other two virtual machines only have internal IPs (allocated by the virtual machine management tool) and access the public network through SNAT. Virtual machines 2 and 3 each connect to a different VPN, so the failures of the two VPNs can basically be considered as mutually independent.
In virtual machine 1, configure bind9’s /etc/bind/named.conf.options, use virtual machines 2 and 3 as backends, and specify that this bind9 does not resolve domain names itself (otherwise, when both backends fail to resolve, bind9 will try to resolve itself, and the result may be a polluted IP).
1 | forwarders { $VM2_IP; $VM3_IP; }; |
In virtual machines 2 and 3, deploy bind9 according to the method mentioned above, and start the VPN. Don’t forget the settings for allow-query and allow-recursion.
The effect of load balancing:
(Due to different measurement methods, please do not directly compare the numbers in the following figure with those before load balancing, so the two curves before and after load balancing are not drawn in one figure. The average response time should be halved, not as exaggerated as in the figure. The main thing is to look at the trend of the curve, the extreme values have been greatly reduced.)
Of course, if your two VPNs are not very stable, you can add a third, fourth… This load balancing method is universal.
If you have any better methods to build anti-pollution DNS, or if you have any doubts about some of the practices in this article, you are welcome to discuss.