Pages

Tuesday, 23 July 2024

Optimizing OpenVPN Throughput

 

In the previous post, I talked about OpenVPN TCP and UDP tunnels and why you should not be using TCP. In this post, I’m going to talk about optimizing the said tunnels to get the most out of them.
Believe it or not, the default OpenVPN configuration is likely not optimized for your link. It probably works but its throughput could possibly be improved if you take the time to optimize it.
A tunnel has 2 ends! Optimizing one end, does not necessarily optimizes the other. For the proper optimization of the link, both ends of the tunnel should be in your control. That means when you are using OpenVPN in server mode serving different clients that you do not have control over, the best you could do is to optimize your own end of the tunnel and use appropriate default settings suitable for the most clients.
Below are some techniques that could be used to optimize your OpenVPN tunnels.

Compression

In today’s world where most connections are either encrypted or pre-compressed (and more commonly both), you probably should think twice before setting up compression on top of your vpn tunnel.
While it still could be an effective way to reduce the size of un-encrypted compressible packets (e.g., un-encrypted SQL connections / FTP connections transferring text files), these usages must be frequent enough to justify the use of compression.
Security complications could arise when encryption and compression are used together.
OpenVPN compresses each packets individually. Furthermore, if you load compression module on one side, it must be loaded on the other side as well.
Loading the compression module is not the same as enabling it. Enabling the module could be done after the tunnels initialization and can even be pushed to the clients. OpenVPN manual provides more info on the subject.
If you decide to not use compression at all, prevent loading it completely by not including the compress (comp-lzo for older versions) line. Depending on the used algorithm, mere loading the module could reduce your effective tun-mtu size by 1.
Traditionally, OpenVPN used lzo as compression algorithm but recently (OpenVPN v2.4 and above), lz4-v2 (which itself is the successor of lz4), has replaced it.
Below, I will briefly cover each compression methods:

lzo

While still supported in OpenPN v2.4 (as a deprecated feature), comp-lzo option will be removed in a future release. Backward compatibility with older OpenVPN clients is provided by the use of compress option. More info is available HERE.
lzo provides a slightly better compression ratio than the lz4 compression (available in OpenVPN v2.4 and above). It is however, considerably slower and uses more CPU. So you probably shouldn’t be using it unless for backward compatibility reasons.
Worst-case scenario, using lzo might add an extra 1 byte of overhead for incompressible packets.
It is generally best to let the OpenVPN decide whether the lzo compression is worth enabling or not. We do this by setting the compress (or comp-lzo in older versions) option without any arguments.
This will cause OpenVPN to periodically check the effectiveness of lzo compression and disable it if it does more harm than good (i.e., most packets on the transferring side end up with an extra 1 byte of overhead).
You can check the effectiveness of the lzo compression yourself by sending SIGUSR2 signal to the OpenVPN process (if its running as a daemon, the output would go to the syslog file). With such statistics you could decide whether using compression is useful for your link or not.
It is also worth noting that lzo algorithm is in such a way that compressing a block would require relatively more resource that decompressing it.

lz4

Possibly faster compression speed, extremely faster de-compression speed as well as optimized CPU usage, is what offered by lz4. Slight decrease of compression ratio is negligible comparing to its benefits.

lz4-v2

lz4-v2 is an optimized version of lz4 designed specifically for OpenVPN.

lz4 vs lz4-v2

The main difference between lz4 and lz4-v2, is a slight change of algorithm resulting in no overhead at all in case of an incompressible packet.
This means that the so called adaptive compression (that was needed for lzo and even lz4) is required no more in lz4-v2 and worst-case scenario, some CPU usage will be wasted on an incompressible packet without adding any overhead to the packet size (which means packet alignments would be preserved). Kudos to OpenVPN team for this.1
Just like lzo, it should be clear that there isn’t much use to lz4 in place of lz4-v2 except for compatibility with older clients.

Cipher algorithm and size

Different ciphers have different speeds in different hardwares (ie an AES-NI capable CPU). This is a hard topic to cover as it is up to you to decide whether you’d want to sacrifice better encryption to a faster tunnel or using smaller keysize to reduce the CPU load. There are countless of articles about OpenSSL ciphers, their speed and their strength. Do a google search and get yourslef familiarized with the subject. As a side note, to compare the ciphers speed in your platform take a look at openssl speed -h command.

sndbuf and rcvbuf

There have been reports of speed improvement in some circumstances when these values are set to 0 2.

fast-io

This little flag which is supported in non-windows systems, improves your CPU usage when dealing with UDP packets by using non-blocking write operations. It only applies to UDP tunnels.

float

The --float option adds additional 3 bytes of overhead to the clients packets. These 3 bytes contain the so called peer-id which is pushed to them by the server at the time of connection. So if you don’t need the float functionality, don’t use it.
The problem however is that while according to the OpenVPN devs, “float has no effect on multipoint-servers, and never had”, the server still pushes the peer-id to the clients in tls mode (even when --float is not specified).
You can check if this is the case for you by adding the --verb 4 option to the server and then connect to it:
PUSH: Received control message: 'PUSH_REQUEST'
SENT CONTROL [client]: 'PUSH_REPLY,ifconfig 10.1.0.2 255.255.255.0,peer-id 0,cipher AES-256-GCM' (status=1)
OpenVPN devs had this to say about the behavior:
we have way too much conditional code, so we consciously decided “this feature is always-on”
it’s good for performance as well, as the extra 3 bytes make the rest of the packet properly 32bit-aligned = better crypto performance
(effectively: less CPU load, longer battery life)
It is up to you to decide on this but it is possible to remove the added 3 bytes of overhead by making the client to ignore the peer-id push at the connection time:
--pull-filter ignore "peer-id"

MTU adjustments

OpenVPN UDP packets should not be fragmented. So you need to ensure you’re not sending a packet larger than your link’s MTU. In some instances, you may need to manually find the MTU of you link first.
TCP tunnels usually don’t require such adjustments.
The maximum size of the final UDP packet after encapsulation minus the headers. So for example if your link MTU is 1500, the correct value for link-mtu would be 1472 (1500 - 20(IP header) - 8(UDP header)). In the OpenVPN manual it is said that it’s best to not set this value directly. However in my experience, this is in fact the best way to adjust your tun/tap link MTU properly and the tun-mtu value (which we will discuss later) will be derived from that. The default value of link-mtu however is derived from tun-mtu and is bigger than 1500.

tun-mtu

The actual MTU of OpenVPN tun/tap device. This defaults to 1500.
You can only specify either link-mtu or tun-mtu and not both. The other one will be calculated internally by OpenVPN. One other thing to note is that link-mtu applies to final packets (after encryption and encapsulation) while tun-mtu applies to the unencrypted packets which are about to enter the tun/tap device.
The tun-mtu of 1500 is ideal if your physical link MTU could handle it. It would provide maximum compatibility between routers along the way. However this is not always the case. Although OpenVPN is supposed to be able to discover this and act accordingly but the whole thing would collapse if you have a broken PMTUD.
In such case, manual intervention for adjusting MTU is required. In another post I will talk about ways to find the correct MTU of a path but assuming you already know the correct value, you subtract 28 bytes from it and set it as link-mtu value and let OpenVPN calculate the right tun-mtu for you. Again, remember, the calculated tun-mtu value applies to packets before compression/encapsulation and its size highly depends on other factors like cipher algorithm, keysize, compression module, etc.
In very fast links, setting tun-mtu to a high value could potentially help 3.

fragment

This option should generally be avoided when possible. It adds 4 bytes of overhead to each packet. But it is there as the last resort when no other option works. With this option, OpenVPN internally fragments packets to chunks not bigger than the set value and send them over the link. The other end receives and reassembles them to create the original sent packet. This is the only instance I know that a single packet could result in more than one OpenVPN UDP packets being sent over the link.
You might also be interetsed in my Understanding Network IP Fragmentation post.

mssfix

This option only applies to TCP connections inside the tunnel. Maximum Segment Size is yet another feature of TCP. This option is negotiated between peers during TCP handshaking via SYN packets. It is the maximum size of the payload each TCP packet can carry. It does not take IP and TCP header sizes into account. This option can be used in a link with broken PMTUD to at least make TCP connections possible.
Even though MSS itself is a TCP feature, this OpenVPN option targets encapsulated UDP packets. Meaning it changes the MSS value of the TCP protocols inside the tunnel in a way that after UDP encryption/encapsulation the resulting UDP packet size (minues IP/UDP headers), would not exceed the mssfix value.
So in an optimized link, mssfix is either disabled (set to 0) or it’s value would be the same as link-mtu’s.
As a side note, mssfix applies to both sending AND receiving SYN packets so it is not an ideal solution for asymmetric links… but that’s for another post.

Note on TCP tunnels

Aside from the usual TCP tuning and the socket-flags TCP_NODELAY option, probably the best optimization is to get rid of TCP tunneling as a whole and use UDP instead.

  1.  
    from https://archive.is/AtUZK#selection-451.0-1107.2
    ----------------------------------------------------------------------------------------
     

    OpenVPN - TCP or UDP tunneling?

    • is not the wisest idea. But as I’ve mentioned earlier, in UDP tunneling, source IP address of the remote peer can be easily spoofed. If for whatever reason you are not willing to use encryption for your OpenVPN instance, you probably want to at least limit it to trusted source IP addresses and use TCP.
    • When reliability of an unreliable stream is required Assume that you come across a program that while it is sensitive to packet loss, it uses UDP for communication. Like an old syslog server communicating over UDP port 514. You could in theory wrap an OpenPVN TCP tunnel around the whole thing and guarantee the UDP stream reliability to some good extend (Even if it becomes laggy). Or maybe you have a program that sends a single ping to the other end to make some decisions and you also do not have access to the source code to adjust the settings. An ICMP packet inside an active TCP tunnel, would eventually get through even if it takes minutes!
    And that’s it. Now you know the pros and cons of each protocol and are able to make an educated decision on the matter.
    You think I have overlooked a point? Let me know in the comment.

    1. As an example, it’s fundamentally vulnerable to DOS attacks ↩︎
    2. Without a proper feedback system, you wouldn’t even know if the other end receives the data ↩︎
    3. More info could be found here ↩︎
       
      from https://archive.is/UrvaU#selection-693.114-763.2
      -------------------------------------------------------------------

      How to find the correct MTU and MRU of your link

      Overview

      In the previous post, I talked about Network IP Fragmentation, what it is and why it’s needed (You are advised to read it before continuing). I also covered the so called PMTUD Black hole effect.
      Fixing a PMTUD Black hole is a multistep process, and it starts with finding the correct MTU/MRU of your link.
      Now as I’ve discussed, every path can have its own unique MTU/MRU value, but we are usually interested in the max value that is dictated by your ISP.
      When you send a packet, it always routes through your ISP. Because of different protocols in place and their overheads (mostly layer 2 ones), it is common for your ISP to force MTU/MRU of less than 1500 bytes on your link.
      If a packet exceeds these values, your ISP is required to send the appropriate ICMP messages either back to you (for the MTU), or to the server sending the data (for the MRU). These messages give the corresponding hosts a chance to adapt themselves to the link.
      If your ISP decides to not send the required ICMP messages (or they get lost in transaction for some reason), all sorts of issues could arise. And for solving that, the first step is to manually determine your links MTU/MRU values.

      ICMP packets

      The best way for finding your link’s MTU/MRU is by sending ICMP packets (or more precisely, Pings) to the other host.
      To be able to interpret the results, we first need to have an understanding of an ICMP packet’s structure.
      Each PDU in layer 3, consists of different parts. Lets take a look at a typical IPv4 ICMP packet。
      As you can see, we have 20 bytes of IPv4 header at the top, followed by 8 bytes of ICMP header, and finally the data or the payload.

      Procedure

      If we send some packets to a remote host with the DF flag set, and if those packets exceed the maximum packet size of our link, not only they should never reach the remote host, we should also receive an ICMP message from a router along the way (likely our own ISP), informing us.
      The easiest way to do so, is by using ICMP Ping. The ping command, is available in pretty much every platform you can think of.
      To summarize: We set the DF flag on our ICMP packet and send a big enough ICMP Ping payload, after exceeding the maximum packet size of our link, we will observe the results.

      Constructing the ping command

      We first try to send a 1500 bytes ping packet to a remote server. Shortly I will explain why.

      Linux

      For this, we’re going to use iputils package which is most likely already installed in your distro1.
      Open the terminal and issue this command:
      ping -c 4 -s 1472 -M do 8.8.4.4
      The arguments are pretty simple:
      Argument Description
      -c 4 Number of pings
      -s 1472 Size of the payload. Remember that each ICMP payload has 28 bytes of overhead ( 1472 + 28 = 1500 )
      -M do Path MTU Discovery strategy. The do option, makes it to set the Don’t Fragment flag (not confusing at all)
      8.8.4.4 The remote host we’re sending the packets to (In this case, one of google’s public DNS servers)

      Windows

      Open cmd.exe and issue this command:
      ping -l 1472 -f 8.8.4.4
      Again, arguments are very simple:
      Argument Description
      -l 1472 Size of the payload (just like above)
      -f Set the Don’t Fragment flag
      8.8.4.4 Sorry google! Your DNS servers are just too awesome

      Identifying the correct MTU

      Now if you do get a ping reply with the above commands, it means that at least for the path between you and google’s DNS servers, your MTU is 15002 and you should not have any MTU (and likely MRU) problems.
      If you suspect you are having MTU issues on your link, the first step is to reproduce it. So try again those commands with another IP address (or ideally the IP address you have issue with) until you do not get a pong from the other end.
      On a healthy link with MTU of less than 1500 bytes, you should see a response like this:
      On Linux:
      PING 8.8.4.4 (8.8.4.4) 1472(1500) bytes of data.
      From 10.11.12.13 icmp_seq=1 Frag needed and DF set (mtu = 1492)
      ping: local error: Message too long, mtu=1492
      ping: local error: Message too long, mtu=1492
      ping: local error: Message too long, mtu=1492
      
      --- 8.8.4.4 ping statistics ---
      4 packets transmitted, 0 received, +4 errors, 100% packet loss, time 3051ms
      
      On Windows:
      Pinging 8.8.4.4 with 1472 bytes of data:
      Reply from 10.11.12.13: Packet needs to be fragmented but DF set.
      Packet needs to be fragmented but DF set.
      Packet needs to be fragmented but DF set.
      Packet needs to be fragmented but DF set.
      
      Ping statistics for 8.8.4.4:
          Packets: Sent = 4, Received = 1, Lost = 3 (75% loss),
      
      As you can see, we got a response from a hop on the link that it can not pass our packet. We also get the hop’s IP address and as a bonus, Linux ping also shows the MTU.
      Most likely than not, the said hop is either the next immediate router on your path (i.e, your home router) or your ISP.
      You can then adjust the payload size (in this case 1492-28=1464) again and retry. You do that until you get a response from the other end.
      That however, is how it should work and if it did work like this, you wouldn’t be needing to find the MTU manually.
      If you do get a plain ping-timeout reply every time, then you might indeed have a PMTUD black hole on your path. Finding the MTU at this point is as easy as adjusting ping’s payload.
      To summarize: You reduce the ping’s payload until you start getting replies. You could start by reducing its size in half (which you most likely would get a reply) and then fine-tune it from there3.

      Identifying the correct MRU

      You usually shouldn’t be needing to adjust this. It’s not your problem but the next-hop’s ones4.
      There is a certain twist in finding your MRU:
      You typically can’t force the other end to send you a specific packet size with DF flag set.
      Setting the DF flag on your ping packet, does not automatically mean the reply would also have the DF flag set (In fact my testing shows it doesn’t). And even if you could force that, you still would have trouble finding out whether in fact PMTUD works for your link’s MRU or not. That is unless you control the other end as well.
      To summarize: Your best bet for finding the correct MRU of your link, is to ping your host from a remote location (making it the MTU of that remote location). If you don’t have access to a remote host, search the web for online ping services and use those instead.

      Caveats and pitfalls

      You should be aware that there are some situations in which you might not get the expected results. Some of them are as follows:
      • Your ISP might silently remove the DF flags: This is really a bad practice but some ISPs opt for this as a way of solving MTU issues altogether. On such connections, once a packet reaches the ISP, the DF flag is removed.
      • Some remote hosts might send truncated replies: To protect their network, some remote hosts instead of replying the ping with the same payload, they truncate it. Making the reply somewhat invalid. While the client can usually correctly handle this for a single ping packet, the situation could get complicated once they get fragmented. Best to use a host known to not do that (like google DNS servers).
      • A firewall might be blocking your ping request/response: If that’s the case and you are sure it’s not being done on your end, you are pretty much out of luck using ping. One way would be constructing a TCP packet to send to a remote host (this is somewhat more complex). Another way is to just run Wireshark and observe the normal traffic for couple of minutes to make an educated guess about your links MTU/MRU.
      • You might be behind a transparent proxy: That means you think you have a direct connection but you really don’t. Most of your traffic goes through what’s called a Transparent Proxy. Even if you do get a ping response from a remote host, that’s not necessarily reflect your real MTU/MRU to the transparent proxy.
      • Your link MTU might change over time: This is rather unusual for home networks but I’ve seen this on mobile networks. In such cases the all time low MTU of your link is your only reliable MTU.
      • A firewall might block fragmented ping requests/responses: Yes, no kidding, I’ve seen this too. Whether it was accidental and the result of connection tracking issue, or on purpose to possibly discourage the use of ping payloads to transfer real data (e.g., to bypass firewalls), it effectively blocks pings as soon as they get fragmented. So basically you may not have any MTU issue at all and yet, ping results would suggest otherwise. This one is really unfair to troubleshoot!
      • The NIC on your host, migh be set to use a MTU value other than 1500: Specially in windows, this may cause lots of weird results. Whether its set via adapter’s setting or the netsh command, that can influence your result.
      • Some low-level network drivers might affect the result: Again, specially in windows, this can easily happen with security softwares such as Kaspersky’s NDIS filter.

      1. Note that some ping implementations like BusyBox, are not suitable for this since they lacked the required parameters. ^
      2. Or possibly even more, but uncommon. It could also mean your ISP is being really naughty, more on that later. ^
      3. If you never get a reply and you are sure your internet connection is working, then a firewall in the path might be blocking it. I will briefly introduce another approach at the end of the article. ^
      4. Refer to the PMTUD discussion for more info. ^

      from  https://archive.is/nX7Zd

       

       

       
       
       

     

     

     

No comments:

Post a Comment