Post
by cryptostorm_ops » Tue Feb 04, 2014 8:19 am
Another thing I frequently see is technically unsophisticated VPN review articles that report test results for networks. Actually, I don't think I've ever seen a VPN review test done that is at all useful in tracking actual network performance.
What I have seen is some companies that pick settings for their servers that throw "good" test results with speedtest.net, but suck for actual network use. That's actually very common, and is really unfortunate because then people try to to use these "fast" networks and find their actual performance is crap.
Without getting into too much boring detail, the kinds of data that flow across cryptostorm's network backbone are enormously variable. There's everything from low-bandwith TCP sessions to massive-volume, state-free, UDP-based "sessions" involving thousands of simultaneous peer connections in a filesharing application. Plus, all of these come into cryptostorm from a wide range of local network configurations: some are already NATted through a residential router that is barely able to handle packet transit, whereas others are coming out of really well-administered academic or corporate network environments that might as well be their own bloody standalone ASes! And some members see sessions actively packet-shaped/throttled by ISPs who apparently feel it's ok to pinch down encrypted network traffic if they feel like it.
The net result is that a single "speed test" application cannot really give an accurate picture of network performance. In fact, most "speed test" apps are fairly simple: TCP-based tests that self-throttle as they see packets start to back up into source-device queues (otherwise they'd crash their own outbound machines if they kept shoveling packets into queue when their routers and/or NICs notice the queues are filling up for a given session). In a generic sense, that's fine - but when sessions come through cryptostorm, they are natively stripped down to the packet level, NATted through the kernel, and pushed off the physical NICs of our cluster servers as one big clump of packets (and the reverse, for inbound-from-member data - think of the topological model of the TUN interface mediating between the cryptostorm network daemon and the physical NIC's 'window to the world' layering). There's a series of packet (and socket assignment) queues that happen in this process - and a speed test app that doesn't know about that can throw very unreliable/inaccurate results through no fault of its own. It's just not designed to measure this kind of network topology.
Our in-house metrics for cluster performance are driven almost entirely by close attention to socket allocation overhead at the kernel level, as well as packet queues at the NIC. If sockets are assigning smoothly, and the NICs are able to onload/offload packets without having their buffers overflow into kernel space (and the attendant ring buffers involved), then our perf-tuning is successful. This is because, obviously, there's a huge amount of stochasticity in actual network traffic coming through an individual node/server: maybe there's 100 members connected, but not many using alot of bandwidth... or perhaps there's only 20 connected, but several are sitting on 100 megabit local pipes & are pushing big files through the network. Or: some of our nodes are heavily provisioned, some much less so and rather are clustered together to loadbalance amoungst themselves. So one machine might carry 100 sessions of high-traffic members just fine, whereas another would choke long before that.
In the end, the best "test data" come from members who tell us whether the network is performing on par with "bareback" non-cryptostorm sessions. That's the real metric. You hear alot of blabber about how "encryption slows down VPNs," and it's essentially all bullshit. I've yet to see kernel metrics on a VPN network that showed raw CPU bottlenecking as a result of simple application of symmetric crypto. Long before that, other areas of kernel performance are the cause of transit problems. These areas require much more experience and knowledge to diagnose and fine-tune, and so you see amateurs blame "crypto" when their servers are slow. That's nonsense. The OpenSSL libraries themselves, for all their flaws, are fast post-compile and work nicely with all the major chipset architectures at the binary level. They aren't where things slow down.
Perhaps the worst thing we see is some networks that, because their admins don't have any idea how to properly run machines, simply allow any one network session to effectively monopolise an entire server (or more often, VPS instance) during a "speed test." This is how you see really bad VPN networks show "good" speed test results. That one TCP session for a speedtest.net result has grabbed all the kernel's resources for packet transit, and is all but locking down the NIC as it pushes packets through. Sure, it shows 10 megabits/second download or whatever... but every other person logged into that machine just saw their sessions slow to a crawl or start dropping TCP packets entirely. Of course, the "test doesn't report that - since those other people have no idea they've been crowded out by that speedtest.net session. And the review goes up on some clicbbait blog somewhere: fast!
But if you're actually running a network to benefit all the network members, and not just to trick uneducated clickbait "reviewers" into saying your network is fast, then doing this is a terrible plan. You want everyone on the network to have consistently good network performance, whether they're pushing a big file across via TCP or whether they're gathering bits and pieces of obscure .torrents from a few hundred global swarm peers on ephemeral UDP connects. And you want the people streaming video to get reliable stream performance, plus those using videochat or other realtime apps to have non-glitchy sessions. That's a big-picture challenge that is NOT reflected by clicking an icon on speedtest.net and posting a screenshot of the result.
We know we have a performance problem when we get messages via our support folks that "the network seems slow" from a chunk of actual network members. We go into high-gear when that happens, as it's always "real" as compared to nonexistent "performance issues" that come when one person somewhere clicks on a speed test and worries that the numbers don't look high enough. Sometimes, that's a sign of a problem - but usually not. We're monitoring our machines closely enough, 24/7, that a simple problem like that will already have thrown red flags that are going to hit my monitory long before that. It's still good to get those reports, of course, but usually they're transient: an ISP bottleneck, a router that isn't handling port 443 UDP packets well, that sort of thing. But if a dozen members say that the Frankfurt cluster is slow, then I guarantee you there's a problem there - it's a question of hunting it down.
Perf-tuning cryptostorm is a really fascinating technical challenge: unlike most areas of network administration, it's mostly new questions that we ask and we can't really just go to Stack Exchange and see what other smart people are already doing. So we do alot of experimental parameter tweaking at the kernel level, within exitnode clusters, realtime. This kind of work is evolving into a full-time job, to be honest, as it's clear that there's still big gains to be made on overall, real-use-scenario performance at the network level. Probably a few dissertation topics lurking in there, too, as time goes by.
Anyway, when people want to see if cryptostorm meets their needs for performance we ask our support folks to provide them with testing tokens, and let actual network use be the standard. At that level, it is very very rare that someone tests the network and feels that it's slow in actual use (not just a speedtest.net result) - and if we do hear that, we listen closely as it's a chance to learn something important.
I am happy to share as much as people are interested in reading, when it comes to the specifics of how we perf-tune the clusters. I don't worry that some competitor will "steal" what we report, because doing this is alot of work and there's not some bash script that will just magically make it happen if someone wants to click on it. It requires careful attention to many layers of systems architecture - and anyone able to do that effectively is welcome to borrow what we've learned, at cryptostorm, in their own work. Hopefully, they'll share back their own results and experience - but even if they don't, we're not keeping stuff secret.
But sometimes I can see people's eyes glaze over when I talk about this stuff - to me it is fascinating, but not for everyone.
Finally, for people connecting to cryptostorm from Linux machines, I'm happy to provide some advice on tuning param options locally to ensure good throughput. The current kernel builds are pretty good about most stuff, but some of the packet buffering defaults are... mystifying to me, really. Those guys are super smart, so I am sure they have their reasons, but from my perspective I'd never stay with default kernel settings on one of my own local machines, in terms of network optimization. I assume this might also be true for Macs as well, since they're just Linux hiding under a layer of high-margin walled-garden obfuscation... but I don't know firsthand as it's not my world.
~ c_o