I helped a friend with port mapping and encountered two pitfalls since I haven’t touched iptables for a few months. I’d like to share them with you.

For ease of description, let’s assume that the server facing the public network is G (meaning Gateway), and its port 80 is mapped to the port 80 of the internal network server B (meaning Backend). G has two different ISP access lines, connected to the eth0 and eth1 network cards respectively, with IPs 1.1.1.1 and 2.2.2.2. B’s internal network IP is 192.168.0.2, G’s internal network IP is 192.168.0.1, and the network card that G connects to the internal network is eth2.

Sometimes SNAT is needed after DNAT

The initial incorrect configuration on gateway G was as follows:

1
2
iptables -t nat -A PREROUTING -d 1.1.1.1/32 -i eth0 -p tcp -m tcp --dport 80 -j DNAT --to-destination 192.168.0.2
iptables -t nat -A PREROUTING -d 2.2.2.2/32 -i eth1 -p tcp -m tcp --dport 80 -j DNAT --to-destination 192.168.0.2

I found that 1.1.1.1:80 and 2.2.2.2:80 could not be connected. Packet capture on gateway G showed that the source IP did not change, only the destination IP changed. And backend B also has a direct line to the Internet without going through the gateway, so the returned packets did not go through the gateway and were directly sent to the Internet. The response packets received by the user came from different IPs (backend B accesses the Internet without going through gateway G), so they were naturally discarded.

The solution is to do SNAT after DNAT. Here, it should be noted that for new incoming connections, POSTROUTING is after PREROUTING modifies the destination IP, so the destination IP that POSTROUTING matches should be the mapped internal network IP. The modified rules are as follows:

1
2
3
iptables -t nat -A POSTROUTING -d 192.168.0.2/32 -o eth2 -p tcp -m tcp --dport 80 -j SNAT --to-source 192.168.0.1
iptables -t nat -A PREROUTING -d 1.1.1.1/32 -i eth0 -p tcp -m tcp --dport 80 -j DNAT --to-destination 192.168.0.2
iptables -t nat -A PREROUTING -d 2.2.2.2/32 -i eth1 -p tcp -m tcp --dport 80 -j DNAT --to-destination 192.168.0.2

Most online tutorials give the impression that “port mapping = DNAT”, because in most cases, the response packets returned by the backend server must go through the gateway. When this assumption does not hold, that is, the packets sent by the backend server do not necessarily go through the gateway, be careful of this pitfall and add SNAT after DNAT.

Tagging connections to choose the correct exit

According to the above configuration, only one public IP can be connected, and the other public IP cannot be connected. After tcpdump, backend B returned a TCP response packet, but this packet was sent to the wrong public port. But the source IP of this packet is clearly correct, that is, it should not go wrong according to the pre-set policy routing. What’s going on?

When Linux forwards a packet, it goes through the PREROUTING, routing, and POSTROUTING stages in turn. Which network card a packet should go out from is determined at the routing stage. For packets initiated by users, DNAT rules are matched and the destination IP is modified at the PREROUTING stage, and SNAT rules are matched and the source IP is modified at the POSTROUTING stage. For the reply packets returned by the backend, the destination IP is modified at the PREROUTING stage by matching the entries in the NAT table established by the SNAT rules, and the source IP is modified at the POSTROUTING stage by matching the entries in the NAT table established by the DNAT rules.

Let’s consider the TCP connection from user 100.100.100.100 to 1.1.1.1. Since it has gone through SNAT and DNAT, the TCP SYN packet received by backend B is (192.168.0.1 => 192.168.0.2), and the returned TCP SYN+ACK response is (192.168.0.2 => 192.168.0.1). After arriving at gateway G, at the PREROUTING stage, the destination IP is modified to 100.100.100.100. At the routing stage, its source IP is still 192.168.0.2, and policy routing is not useful at all! It is not until the end of routing, when the exit network card has been selected, that the source IP will be modified to 1.1.1.1 at the POSTROUTING stage. What to do?

During the routing stage, it is necessary to identify which public network exit this packet should be sent to. It is impossible to do so based on the content of a single packet, and can only hope for the connection information to which this packet belongs. Fortunately, iptables can mark connections, and policy routing can select exits based on tags.

After adding the following command, the port mapping of the two public network IPs works normally.

1
2
3
4
5
6
7
8
9
10
iptables -t mangle -A PREROUTING -j CONNMARK --restore-mark
iptables -t mangle -A PREROUTING -m mark ! --mark 0 -j ACCEPT
iptables -t mangle -A PREROUTING -d 1.1.1.1/32 -i eth0 -p tcp -m tcp -j MARK --set-mark 1
iptables -t mangle -A PREROUTING -d 2.2.2.2/32 -i eth1 -p tcp -m tcp -j MARK --set-mark 2
iptables -t mangle -A PREROUTING -j CONNMARK --save-mark

ip route replace default via 1.1.1.254 dev eth0 table 1000
ip rule add fwmark 1 lookup 1000
ip route replace default via 2.2.2.254 dev eth1 table 1001
ip rule add fwmark 2 lookup 1001

Explanation of the iptables part:

  1. The tag used by policy routing is on the packet, and we now need to use a connection-based tag. Therefore, in the PREROUTING stage, the tag of the connection needs to be copied to the packet.

  2. If this connection has been tagged, ignore it directly. This is a shortcut, and it does not affect correctness if it is removed.

  3. Tag the TCP connection arriving at 1.1.1.1 with 1.

  4. Tag the TCP connection arriving at 2.2.2.2 with 2.

  5. Save the tag on the packet to the TCP connection.
    Explanation of the policy routing part:

  6. Set a default route in the 1000 routing table, pointing to the gateway of the eth0 line (this gateway is given to us by the ISP).

  7. Let the packets tagged with 1 go through the 1000 routing table. Since there is only one default route, it must match, that is, go out of the eth0 exit.

  8. Set a default route in the 1001 routing table, pointing to the gateway of the eth1 line (this gateway is given to us by the ISP).

  9. Let the packets tagged with 2 go out of the eth1 exit.

Summary

This article discusses two common pitfalls in Linux port mapping, one is the situation where the reply packet of the backend server may bypass the gateway, and the other is the situation where the gateway has multiple public network lines. I fell into these two pits earlier, and it took a long time to climb out, but it was just a few months, and I “forgot the pain after the scar healed”. Record it here for future needs.

In addition, tcpdump is to network debugging what gdb is to program debugging. Debugging tools are really “learned in the contemporary, used in the future”.

(Note: Readers of Renren may find the code messy. This is because this article is posted on my USTC blog and then automatically synchronized to Renren, and Renren “reloads” the style of the HTML pre tag. The original link of this article: https://ring0.me/2014/02/port-mapping-fallacies/)

Comments