Disable TCP Congestion Control for Tunnel Connections to Improve Transmission Efficiency
When building a cross-regional server network, such as the VLESS connection used in the article “Building a Three-Layer Tunnel with Full US IP, No Manual Proxy Setup Required,” we often encounter an efficiency issue: the congestion control mechanism of the TCP protocol itself. Although TCP congestion control is crucial for the public internet, in tunnel scenarios where the application layer protocol is already encapsulated (and may have its own flow control or congestion handling), the outer layer TCP congestion control becomes a burden.
Why Disable TCP Congestion Control and Nagle in Tunnels?
TCP-over-TCP Problem: When you transmit data of one TCP connection (e.g., VLESS over TCP) inside another TCP connection, the so-called “TCP-over-TCP” problem arises. Both the inner and outer TCP have their own congestion control and retransmission mechanisms. When packet loss occurs, both layers of TCP will attempt retransmission and reduce the congestion window. This dual processing is not only redundant but can also lead to a sharp decline in performance, especially on high-latency, high-packet-loss international links. The retransmission timer of the inner TCP may trigger prematurely due to the delay and retransmission of the outer TCP, and vice versa, forming a vicious cycle. Additionally, TCP-over-TCP can cause severe Head-of-Line Blocking issues: a lost packet in the outer TCP will block all data of the inner connections it contains, even if these inner connections are completely unrelated. This means that a connection issue of one user may affect other users sharing the same tunnel.
Application Layer Flow Control: The application layer protocol transmitted in the tunnel may already have its own flow control and reliability mechanisms. In this case, the congestion control of the underlying TCP is completely redundant, and it will only interfere with the normal operation of the upper-layer protocol, limiting its performance potential.
Nagle Algorithm Delay: The Nagle algorithm aims to reduce the number of small packets in the network by aggregating small TCP packets into a larger one, thereby improving network utilization. However, in tunnel scenarios, we usually want data to be transmitted through the tunnel as quickly as possible, especially for interactive applications (like SSH) or applications with high real-time requirements. The delay introduced by the Nagle algorithm may negatively impact these applications. Disabling Nagle (via the TCP_NODELAY option) allows small packets to be sent immediately, reducing latency.
UDP’s Dilemma on the Public Internet: You might wonder, if TCP has so many issues, why not use UDP to establish tunnel connections directly? Unfortunately, UDP on the public internet, especially on international links, is often subject to ISP QoS (Quality of Service) policies, has lower priority, and is more likely to be dropped or throttled, leading to unstable connections. Therefore, in many cases, we have to choose TCP as the tunnel transport layer protocol, which requires us to find ways to optimize TCP’s behavior.
Therefore, for tunnel connections between servers (especially cross-regional connections), disabling the outer layer TCP’s congestion control and Nagle algorithm can significantly improve the tunnel’s throughput and response speed.
Solution: A Script
To solve this problem, we can use a script that specifically disables congestion control and the Nagle algorithm for TCP connections to a specific target IP (e.g., the tunnel’s remote server), while maintaining the system’s default TCP behavior for other connections.
Usage: Execute this script on both ends of the tunnel:
Download the script: Download the script file from the link above.
Grant executable permission: chmod +x bypass-congestion-control.sh.
Run the script: Run this script using the IP address or domain name of the opposite end server: sudo ./bypass-congestion-control.sh <target_ip/target_domain>
Note that you need to run this script on both ends of the tunnel using the IP address or domain name of the opposite server. If the connection is through a VPC intranet, the IP here should be the intranet IP; if the connection is through the public internet, use the public IP.
Why run the script on both ends of the tunnel? Because in tunnel communication, data flow is bidirectional, and both ends need to send data, while congestion control limits the sending window, and the receiver cannot make the sender send faster. When the domestic server sends data to the US server, the TCP congestion control on the domestic server will affect the sending rate. When the US server returns data to the domestic server, the TCP congestion control on the US server will affect the sending rate. Therefore, congestion control needs to be disabled on both ends.
Let’s analyze the working principle of this script.
Detailed Explanation of the Script’s Principle
The core idea of this script is simple: through a series of Linux networking tricks, we can make TCP connections between specific IPs behave more like “bare connections,” free from congestion control restrictions.
Specifically, the script combines iptables traffic marking, policy routing, and custom kernel modules to precisely control traffic to specific servers. We cleverly use Linux’s routing features to open a “fast lane” for tunnel traffic and disable TCP’s two major performance killers: congestion control and the Nagle algorithm.
Although this cannot completely solve all TCP-over-TCP issues, it significantly improves the throughput and response latency of cross-regional three-layer tunnel connections while maintaining TCP protocol compatibility.
Mark Target Traffic (iptables)
The script uses the mangle table of iptables to mark (MARK) all TCP packets entering and leaving the target IP address.
iptables -t mangle -A OUTPUT -p tcp -d $TARGET_IP -j MARK --set-mark 1: Marks outbound packets to the target IP.
iptables -t mangle -A INPUT -p tcp -s $TARGET_IP -j CONNMARK --set-mark 1: Marks inbound connections from the target IP.
iptables -t mangle -A OUTPUT -p tcp -m connmark --mark 1 -j MARK --set-mark 1: Marks outbound packets belonging to marked connections (e.g., responses to inbound connections).
In this way, all TCP traffic related to the target IP is marked with “1.”
Policy Routing (ip rule and ip route)
The script creates a separate routing table named nocongestion (table ID 200, which can be defined in /etc/iproute2/rt_tables).
ip rule add fwmark 1 table nocongestion: Adds a policy routing rule specifying that all packets marked as “1” by iptables should query the nocongestion routing table.
ip route add default via $DEFAULT_GATEWAY dev $DEFAULT_INTERFACE table nocongestion: Adds a default route in the nocongestion table pointing to the system’s default gateway. This means that marked traffic is still sent through the normal path but is affected by the nocongestion routing table’s configuration.
Disable Congestion Control (congctl none)
Check Availability: The script first checks if the system has a TCP congestion control algorithm named none (grep -q "none" /proc/sys/net/ipv4/tcp_available_congestion_control). Some modern Linux distributions may include this algorithm.
Apply to Routing Table: If the none algorithm is available, the script applies it to the default route in the nocongestion routing table: ip route change default ... table nocongestion congctl none. The congctl none parameter tells the kernel not to use any congestion control algorithm for packets sent through this route (i.e., marked traffic), allowing data to be sent as quickly as possible.
Compile Custom Kernel Module (Fallback): If the system does not have the none algorithm, the script automatically compiles and loads a custom Linux kernel module named tcp_none. This module implements a very simple congestion control logic: it always sets the congestion window (cwnd) to a very large value (UINT_MAX / 2) and ignores all congestion events (such as packet loss). The compilation process requires the system to have build-essential and the corresponding linux-headers installed. Once loaded, the none algorithm becomes available, and the script continues to apply it to the routing table as in the previous step.
Maintain Global Settings: The cleverness of this method lies in the fact that it only affects traffic marked and routed to the nocongestion table. The system’s global default congestion control algorithm (such as cubic or bbr) remains unchanged and is used for all other network connections.
Disable Nagle Algorithm (TCP_NODELAY)
Compile Shared Library (disable_nagle.so): The script compiles a small C shared library (/usr/local/lib/disable_nagle.so). This library uses dlsym and RTLD_NEXT to intercept the standard connect system call.
Inject setsockopt: Inside the intercepted connect function, it first calls the original connect function and then, before returning, calls setsockopt(sockfd, IPPROTO_TCP, TCP_NODELAY, &flag, sizeof(int)) to set the TCP_NODELAY option for the established socket, thereby disabling the Nagle algorithm.
Use via LD_PRELOAD: To disable Nagle for a specific application’s connections, you need to preload this shared library when starting the application: LD_PRELOAD=/usr/local/lib/disable_nagle.so your_application. For example, if you use xray as the VLESS client, you can modify its systemd service file by adding Environment="LD_PRELOAD=/usr/local/lib/disable_nagle.so" before ExecStart.
enforce_no_congestion.sh Attempt (Experimental): The script also creates a background service script (enforce_no_congestion.sh) that periodically checks iptables and routing rules. It also attempts to find processes connected to the target IP and tries to dynamically set TCP_NODELAY for these connections using strace (if installed). This method is rather hacky, relies on strace, and may be unstable or ineffective in practice. It is recommended to use the LD_PRELOAD method to disable Nagle.
Persistence (systemd)
To ensure that iptables rules and routing settings remain effective after a system reboot and can automatically recover after certain network events, the script creates and enables a systemd service (enforce-no-congestion.service). This service runs the enforce_no_congestion.sh script in the background, periodically checking and reapplying the necessary iptables and ip route commands.
Notes and Risks
Root Privileges: The script requires root privileges to run because it needs to modify network settings, load kernel modules, and manage services.
Kernel Module Compilation: The script depends on the system having the correct build tools (build-essential, gcc, make) and headers matching the currently running kernel version (linux-headers-$(uname -r)). If the environment is not met, the compilation will fail.
Kernel Module Risks: Loading custom kernel modules always carries some risk. Although this module is simple, it may still cause system instability or crashes. Be sure to use it in a test environment.
Connection Interruptions: The --update-module option attempts to interrupt connections using the none algorithm, which may cause service disruptions. Some aggressive connection termination methods (such as taking the network card offline) are highly risky.
Nagle Disabling Method: The method used in enforce_no_congestion.sh to disable Nagle with strace may be ineffective or unreliable. LD_PRELOAD is a more recommended approach.
Limitations of the Technical Solution
Redundant Retransmission in TCP-in-TCP: It should be noted that while this method disables congestion control, it cannot completely avoid the redundant retransmission problem in TCP-in-TCP scenarios. When the outer TCP experiences packet loss, it will still retransmit; at the same time, if the inner TCP detects a timeout, it will independently initiate retransmission. This dual retransmission mechanism is an inherent flaw of the TCP-in-TCP solution, and this script can only mitigate performance degradation by disabling congestion control, but cannot fundamentally solve the protocol redundancy issue.
Head-of-Line Blocking: This method also cannot solve the inherent head-of-line blocking problem of TCP-in-TCP. When the outer TCP connection experiences packet loss, all inner traffic passing through that connection will be blocked until the outer TCP completes retransmission. This means that even unrelated connections can affect each other by sharing the same TCP tunnel, especially in high packet loss environments where performance degradation is more pronounced. Only switching to a UDP-based transport protocol (such as WireGuard) or a new generation protocol that supports multiplexing (such as QUIC/HTTP3) can fundamentally solve these issues.
# Script to disable TCP congestion control and Nagle's algorithm for a specific IP address or domain # Automatically installs dependencies and compiles a kernel module if needed
# Check if script is run as root if [ "$(id -u)" -ne 0 ]; then echo"This script must be run as root" >&2 exit 1 fi
# Function to install dependencies install_dependencies() { echo"Installing dependencies..." ifcommand -v apt-get &> /dev/null; then apt-get update apt-get install -y build-essential linux-headers-$(uname -r) iptables dnsutils elifcommand -v yum &> /dev/null; then yum install -y kernel-devel gcc make iptables bind-utils elifcommand -v dnf &> /dev/null; then dnf install -y kernel-devel gcc make iptables bind-utils elifcommand -v pacman &> /dev/null; then pacman -Sy --noconfirm base-devel linux-headers iptables bind-tools else echo"Unsupported package manager. Please install build-essential, linux headers, and iptables manually." exit 1 fi }
# Function to resolve domain to IP resolve_domain() { local domain="$1" local resolved_ip echo"Resolving domain $domain to IP address..." ifcommand -v dig &> /dev/null; then resolved_ip=$(dig +short "$domain" | grep -v "\.$" | head -n 1) elifcommand -v host &> /dev/null; then resolved_ip=$(host "$domain" | grep "has address" | head -n 1 | awk '{print $NF}') elifcommand -v nslookup &> /dev/null; then resolved_ip=$(nslookup "$domain" | grep -A2 "Name:" | grep "Address:" | head -n 1 | awk '{print $2}') else echo"No DNS resolution tools found. Please install dig, host, or nslookup." exit 1 fi if [ -z "$resolved_ip" ]; then echo"Failed to resolve domain $domain to IP address." exit 1 fi echo"Domain $domain resolved to IP: $resolved_ip" echo"$resolved_ip" }
# Function to update the tcp_none module when it's in use update_tcp_none_module() { echo"Updating tcp_none module with unlimited cwnd..." # Check if module is currently loaded if lsmod | grep -q "tcp_none"; then # Identify connections using the 'none' congestion control active_connections=$(ss -ti | grep none) if [ -n "$active_connections" ]; then echo"Active connections found using tcp_none module:" echo"$active_connections" echo"These connections will be terminated to update the module." read -p "Press Enter to continue or Ctrl+C to abort..." </dev/tty # More aggressive connection termination approach
# 1. First try to kill processes echo"Finding and terminating processes with active connections to $TARGET_IP..." TARGET_CONNS=$(ss -tnp | grep "$TARGET_IP") if [ -n "$TARGET_CONNS" ]; then # Extract PIDs of processes using these connections PIDS=$(echo"$TARGET_CONNS" | grep -oP 'pid=\K\d+' | sort -u) if [ -n "$PIDS" ]; then for PID in$PIDS; do echo"Terminating process $PID" kill -9 $PID 2>/dev/null || echo"Could not kill process $PID" done fi fi # 2. Create firewall rules to reset all connections to/from the target IP echo"Adding firewall rules to reset connections to $TARGET_IP..." iptables -F iptables -A INPUT -p tcp -s $TARGET_IP -j REJECT --reject-with tcp-reset iptables -A OUTPUT -p tcp -d $TARGET_IP -j REJECT --reject-with tcp-reset # 3. Enable aggressive TCP TIME_WAIT connection handling echo"Enabling aggressive TCP connection cleanup..." echo 1 > /proc/sys/net/ipv4/tcp_tw_recycle 2>/dev/null || true echo 1 > /proc/sys/net/ipv4/tcp_tw_reuse 2>/dev/null || true echo 1 > /proc/sys/net/ipv4/tcp_abort_on_overflow 2>/dev/null || true echo 10 > /proc/sys/net/ipv4/tcp_fin_timeout 2>/dev/null || true # 4. Handle specific connections directly using ss output echo"Directly targeting connections using 'none' congestion control..." ss -tn state established | grep "$TARGET_IP" | awk '{print $4}' | whileread local_addr; do if [[ "$local_addr" == *:* ]]; then local_ip=$(echo$local_addr | cut -d: -f1) local_port=$(echo$local_addr | cut -d: -f2) echo"Closing connection on local port $local_port" iptables -A INPUT -p tcp --dport $local_port -j REJECT --reject-with tcp-reset iptables -A OUTPUT -p tcp --sport $local_port -j REJECT --reject-with tcp-reset fi done # 5. Wait and check again echo"Waiting for connections to terminate..." sleep 5 # 6. Clean up the temporary iptables rules iptables -F fi
# Try to unload the module echo"Attempting to unload tcp_none module..." rmmod tcp_none 2>/dev/null # If module is still loaded, try more aggressive approach if lsmod | grep -q "tcp_none"; then echo"Module still in use. Trying system-wide connection reset..." # System-wide approach (DANGEROUS but effective) echo"WARNING: This will reset ALL TCP connections on the system!" read -p "Continue? [y/N] " confirm </dev/tty if [[ "$confirm" == "y" || "$confirm" == "Y" ]]; then # Reset all connections by toggling the network interface DEFAULT_INTERFACE=$(ip route | grep default | head -n 1 | awk '{print $5}') if [ -n "$DEFAULT_INTERFACE" ]; then echo"Temporarily disabling interface $DEFAULT_INTERFACE..." ip linkset$DEFAULT_INTERFACE down sleep 2 ip linkset$DEFAULT_INTERFACE up sleep 2 fi # Try to unload again rmmod tcp_none 2>/dev/null fi # Final check if module is unloaded if lsmod | grep -q "tcp_none"; then echo"Could not unload module. You may need to reboot the system." echo"Continuing anyway to try compiling the new module..." else echo"Successfully unloaded tcp_none module." fi else echo"Successfully unloaded tcp_none module." fi fi # Navigate to module directory or create a new one MODULE_DIR=$(mktemp -d) cd"$MODULE_DIR" # Create updated Makefile and module source echo"Creating updated tcp_none module with unlimited cwnd..." # Create Makefile (same as before) cat > Makefile << 'EOF' obj-m += tcp_none.o
all: make -C /lib/modules/$(shell uname -r)/build M=$(PWD) modules
clean: make -C /lib/modules/$(shell uname -r)/build M=$(PWD) clean EOF
static void tcp_none_init(struct sock *sk) { /* Set congestion window to maximum possible value */ struct tcp_sock *tp = tcp_sk(sk); tp->snd_cwnd = UINT_MAX / 2; /* Use a very large value (near max of u32) */ tp->snd_cwnd_clamp = UINT_MAX / 2; /* Also set the clamp to a very high value */ }
static void tcp_none_cong_avoid(struct sock *sk, u32 ack, u32 acked) { /* Always keep the congestion window maximized */ struct tcp_sock *tp = tcp_sk(sk); /* If cwnd somehow gets reduced, set it back to maximum */ if (tp->snd_cwnd < UINT_MAX / 2) { tp->snd_cwnd = UINT_MAX / 2; } }
static u32 tcp_none_ssthresh(struct sock *sk) { /* Return maximum value */ return TCP_INFINITE_SSTHRESH; }
static u32 tcp_none_undo_cwnd(struct sock *sk) { /* Return maximum value instead of current CWND */ return UINT_MAX / 2; }
static void tcp_none_cwnd_event(struct sock *sk, enum tcp_ca_event event) { /* Force cwnd to stay at maximum regardless of events */ struct tcp_sock *tp = tcp_sk(sk); tp->snd_cwnd = UINT_MAX / 2; }
MODULE_AUTHOR("Kernel Module Generator"); MODULE_LICENSE("GPL"); MODULE_DESCRIPTION("TCP Congestion Control with No Algorithm and Unlimited CWND"); EOF
# Compile the module echo"Compiling updated kernel module..." make
# Load the new module echo"Loading updated kernel module..." insmod tcp_none.ko
# Verify module loaded if ! grep -q "none" /proc/sys/net/ipv4/tcp_available_congestion_control; then echo"Failed to load 'none' congestion control module." exit 1 fi
echo"Successfully updated and loaded 'none' congestion control module with unlimited cwnd." # Update routes to use the new module echo"Updating routes to use new module..." # Check if congctl is supported by ip route if ip route help 2>&1 | grep -q "congctl"; then ip route change default via "$DEFAULT_GATEWAY" dev "$DEFAULT_INTERFACE" table nocongestion congctl none else # Fallback if congctl is not supported echo"congctl option not supported on this system" ip route change default via "$DEFAULT_GATEWAY" dev "$DEFAULT_INTERFACE" table nocongestion # Try using sysctl to set congestion control for specific interfaces echo"Using alternative method to set congestion control" echo"none" > /proc/sys/net/ipv4/tcp_congestion_control fi cd - >/dev/null }
# Check if arguments include an update flag if [ "$1" == "--update-module" ]; then # Check if TARGET_IP was previously set if [ -f /var/lib/no-congestion-target ]; then TARGET_IP=$(cat /var/lib/no-congestion-target) echo"Updating module for existing target: $TARGET_IP" # Get default route interface and gateway DEFAULT_ROUTE=$(ip route | grep default | head -n 1) DEFAULT_INTERFACE=$(echo"$DEFAULT_ROUTE" | awk '{print $5}') DEFAULT_GATEWAY=$(echo"$DEFAULT_ROUTE" | awk '{print $3}') update_tcp_none_module exit 0 else echo"No target IP found. Please run the script with a target IP first." exit 1 fi fi
# Check arguments if [ $# -ne 1 ]; then echo"Usage: $0 <target_ip_or_domain>" echo" or: $0 --update-module" exit 1 fi
# Check if input is an IP address or domain if [[ "$1" =~ ^[0-9]+\.[0-9]+\.[0-9]+\.[0-9]+$ ]]; then # Input is an IP address TARGET_IP="$1" echo"Using IP address: $TARGET_IP" else # Input is likely a domain name DOMAIN="$1" # Resolve domain to IP TARGET_IP=$(resolve_domain "$DOMAIN") echo"Using domain $DOMAIN with resolved IP: $TARGET_IP" fi
echo"Setting up no congestion control for connections to $TARGET_IP"
# Install dependencies install_dependencies
# Check if 'none' congestion control is available if ! grep -q "none" /proc/sys/net/ipv4/tcp_available_congestion_control; then echo"'none' congestion control is not available. Creating kernel module..." # Create temporary directory for kernel module MODULE_DIR=$(mktemp -d) cd"$MODULE_DIR" # Create Makefile cat > Makefile << 'EOF' obj-m += tcp_none.o
all: make -C /lib/modules/$(shell uname -r)/build M=$(PWD) modules
clean: make -C /lib/modules/$(shell uname -r)/build M=$(PWD) clean EOF
static void tcp_none_init(struct sock *sk) { /* Set congestion window to maximum possible value */ struct tcp_sock *tp = tcp_sk(sk); tp->snd_cwnd = UINT_MAX / 2; /* Use a very large value (near max of u32) */ tp->snd_cwnd_clamp = UINT_MAX / 2; /* Also set the clamp to a very high value */ }
static void tcp_none_cong_avoid(struct sock *sk, u32 ack, u32 acked) { /* Always keep the congestion window maximized */ struct tcp_sock *tp = tcp_sk(sk); /* If cwnd somehow gets reduced, set it back to maximum */ if (tp->snd_cwnd < UINT_MAX / 2) { tp->snd_cwnd = UINT_MAX / 2; } }
static u32 tcp_none_ssthresh(struct sock *sk) { /* Return maximum value */ return TCP_INFINITE_SSTHRESH; }
static u32 tcp_none_undo_cwnd(struct sock *sk) { /* Return maximum value instead of current CWND */ return UINT_MAX / 2; }
static void tcp_none_cwnd_event(struct sock *sk, enum tcp_ca_event event) { /* Force cwnd to stay at maximum regardless of events */ struct tcp_sock *tp = tcp_sk(sk); tp->snd_cwnd = UINT_MAX / 2; }
MODULE_AUTHOR("Kernel Module Generator"); MODULE_LICENSE("GPL"); MODULE_DESCRIPTION("TCP Congestion Control with No Algorithm and Unlimited CWND"); EOF
# Compile the module echo"Compiling kernel module..." make
# Load the module echo"Loading kernel module..." insmod tcp_none.ko
# Verify module loaded if ! grep -q "none" /proc/sys/net/ipv4/tcp_available_congestion_control; then echo"Failed to load 'none' congestion control module." exit 1 fi
echo"Successfully created and loaded 'none' congestion control module." fi
# Create iptables rule to mark packets to the target IP echo"Setting up iptables to mark packets to $TARGET_IP..." # Cleanup existing rules first iptables -t mangle -D OUTPUT -p tcp -d "$TARGET_IP" -j MARK --set-mark 1 2>/dev/null || true iptables -t mangle -D INPUT -p tcp -s "$TARGET_IP" -j CONNMARK --set-mark 1 2>/dev/null || true iptables -t mangle -D OUTPUT -p tcp -m connmark --mark 1 -j MARK --set-mark 1 2>/dev/null || true iptables -t mangle -F
# Add new rules for outbound connections and responses to incoming connections # 1. Mark outgoing packets to target IP iptables -t mangle -A OUTPUT -p tcp -d "$TARGET_IP" -j MARK --set-mark 1 # 2. Mark incoming connections from target IP iptables -t mangle -A INPUT -p tcp -s "$TARGET_IP" -j CONNMARK --set-mark 1 # 3. Mark outgoing packets that are part of connections marked in step 2 iptables -t mangle -A OUTPUT -p tcp -m connmark --mark 1 -j MARK --set-mark 1
# Clean up existing routing rules and tables if they exist ip rule del fwmark 1 table nocongestion 2>/dev/null || true ip route del table nocongestion 2>/dev/null || true
# Create a new routing table for marked packets grep -q "^200 nocongestion" /etc/iproute2/rt_tables || echo"200 nocongestion" >> /etc/iproute2/rt_tables
# Get default route interface and gateway DEFAULT_ROUTE=$(ip route | grep default | head -n 1) DEFAULT_INTERFACE=$(echo"$DEFAULT_ROUTE" | awk '{print $5}') DEFAULT_GATEWAY=$(echo"$DEFAULT_ROUTE" | awk '{print $3}')
# Add route to the new table echo"Setting up routing for marked packets..." # Force remove any existing default route in the nocongestion table ip route del default table nocongestion 2>/dev/null || true # Add the new route ip route add default via "$DEFAULT_GATEWAY" dev "$DEFAULT_INTERFACE" table nocongestion 2>/dev/null || echo"Route already exists, continuing..."
# Add rule to use the new table for marked packets ip rule show | grep -q "fwmark 1 lookup nocongestion" || ip rule add fwmark 1 table nocongestion
# Set the default congestion control to none for all new connections to target IP echo"Setting congestion control algorithm to none specifically for $TARGET_IP..." if grep -q "none" /proc/sys/net/ipv4/tcp_available_congestion_control; then # Set congestion control for specific route instead of globally ip route change default via "$DEFAULT_GATEWAY" dev "$DEFAULT_INTERFACE" table nocongestion congctl none # Keep the global congestion control untouched echo"Congestion control set to 'none' only for connections to $TARGET_IP" echo"Global congestion control remains: $(cat /proc/sys/net/ipv4/tcp_congestion_control)" else echo"WARNING: 'none' congestion control is not available." echo"Using the default congestion control algorithm." fi
# Create a C library to disable Nagle's algorithm cat > /tmp/disable_nagle.c << 'EOF' #define _GNU_SOURCE #include <stdio.h> #include <dlfcn.h> #include <sys/socket.h> #include <netinet/in.h> #include <netinet/tcp.h>
// Override connect function to disable Nagle's algorithm int connect(int sockfd, const struct sockaddr *addr, socklen_t addrlen) { int (*original_connect)(int, const struct sockaddr *, socklen_t); original_connect = dlsym(RTLD_NEXT, "connect"); int result = original_connect(sockfd, addr, addrlen); // Disable Nagle's algorithm by setting TCP_NODELAY int flag = 1; setsockopt(sockfd, IPPROTO_TCP, TCP_NODELAY, &flag, sizeof(int)); return result; } EOF
# Create a script to continuously enforce congestion control cat > /usr/local/bin/enforce_no_congestion.sh << EOF #!/bin/bash # Continuously enforce congestion control settings for $TARGET_IP echo "Starting congestion control enforcement service for $TARGET_IP" # Function to check and set congestion control enforce_cc() { # Verify route settings are in place if ! ip route show table nocongestion | grep -q "congctl none"; then echo "Resetting route congestion control to none" ip route change default via "$DEFAULT_GATEWAY" dev "$DEFAULT_INTERFACE" table nocongestion congctl none fi # Verify iptables rules are in place if ! iptables -t mangle -C OUTPUT -p tcp -d "$TARGET_IP" -j MARK --set-mark 1 2>/dev/null; then echo "Restoring outbound iptables mark rule" iptables -t mangle -A OUTPUT -p tcp -d "$TARGET_IP" -j MARK --set-mark 1 fi # Verify incoming connection mark rule if ! iptables -t mangle -C INPUT -p tcp -s "$TARGET_IP" -j CONNMARK --set-mark 1 2>/dev/null; then echo "Restoring incoming connection mark rule" iptables -t mangle -A INPUT -p tcp -s "$TARGET_IP" -j CONNMARK --set-mark 1 fi # Verify connection mark transfer rule if ! iptables -t mangle -C OUTPUT -p tcp -m connmark --mark 1 -j MARK --set-mark 1 2>/dev/null; then echo "Restoring connection mark transfer rule" iptables -t mangle -A OUTPUT -p tcp -m connmark --mark 1 -j MARK --set-mark 1 fi # Find active connections to target IP and disable Nagle connections=\$(ss -tnp | grep "$TARGET_IP" | grep -v LISTEN) if [ -n "\$connections" ]; then echo "Found active connections to $TARGET_IP" echo "\$connections" | while read -r conn; do pid=\$(echo "\$conn" | sed -n 's/.*pid=\([0-9]*\).*/\1/p') if [ -n "\$pid" ]; then echo "Connection found with PID: \$pid" # Use strace to call setsockopt on the process if command -v strace >/dev/null 2>&1; then # Install strace if not available if ! command -v strace >/dev/null 2>&1; then apt-get update && apt-get install -y strace fi # Find all TCP sockets associated with this PID for fd in /proc/\$pid/fd/*; do if [ -S "\$fd" ]; then echo "Setting TCP_NODELAY on socket \$fd" fi done fi fi done fi } # Main loop while true; do # Enforce route-specific congestion control enforce_cc # Sleep briefly sleep 2 done EOF chmod +x /usr/local/bin/enforce_no_congestion.sh
# Create a service to run our script cat > /etc/systemd/system/enforce-no-congestion.service << EOF [Unit] Description=Enforce no congestion control for target IP After=network.target [Service] Type=simple ExecStart=/usr/local/bin/enforce_no_congestion.sh Restart=always [Install] WantedBy=multi-user.target EOF
# Enable and start the service systemctl daemon-reload systemctl enable enforce-no-congestion systemctl start enforce-no-congestion
# Remove the global sysctl setting rm -f /etc/sysctl.d/99-no-congestion.conf
# Final verification steps echo"Verifying setup:" echo"1. Checking if 'none' congestion control is available:" cat /proc/sys/net/ipv4/tcp_available_congestion_control echo"2. Checking current congestion control algorithm:" cat /proc/sys/net/ipv4/tcp_congestion_control echo"3. Verifying iptables rules:" echo" - Outbound packets to target IP:" iptables -t mangle -L OUTPUT -v | grep "$TARGET_IP" echo" - Incoming packets from target IP:" iptables -t mangle -L INPUT -v | grep "$TARGET_IP" echo" - Responses to incoming connections:" iptables -t mangle -L OUTPUT -v | grep "mark match 0x1"
echo"" echo"Configuration complete!" echo"TCP connections to AND from $TARGET_IP will now bypass congestion control and Nagle's algorithm." if [ -n "$DOMAIN" ]; then echo"Domain name $DOMAIN (resolved to $TARGET_IP) has been configured." fi echo"" echo"To run a specific application with Nagle's algorithm disabled, use:" echo"LD_PRELOAD=/usr/local/lib/disable_nagle.so your_application" echo"" echo"To verify connections are using 'none' congestion control:" echo"ss -ti | grep -A 5 $TARGET_IP" echo"" echo"NOTE: It may take a few seconds for existing connections to switch to 'none'." echo"New connections should use 'none' immediately."
# Store the target IP for future updates echo"$TARGET_IP" > /var/lib/no-congestion-target if [ -n "$DOMAIN" ]; then # Also store domain name if provided echo"$DOMAIN" > /var/lib/no-congestion-domain fi