cryptostorm's Hostname Assignment Framework | Technical Architecture ~ revision 1.1
The purpose of this whitepaper is to provide a technical overview of cryptostorm's hostname assignment framework ("HAF"), which is used to mediate member sessions with the network. Our approach to this process is substantively divergent from that commonly found in old-style "VPN networks," and as such it requires some degree of expository effort to ensure the community is able to critique & review our architectural decisions with sufficient data & insight to provide most effective leverage in doing so.
Note that study of this document, and/or understanding of the principles outlined herein, is not necessary for the regular use of our network by members. Rather, these details are provided as supplementary information for those interested. Network members with precise, specific use-case scenarios requiring particular node/cluster mappings may benefit from the information in this essay, but for most it will be superfluous from a functionality standpoint. Read in if you're curious and want to learn more, in other words: to do so is not required.
There are several, parallel goals assumed to be part of any viable HAF in the context of our security model and member use-case scenarios. These are:
- 1. Resilience against denial-of-service attacks seeking to block members
from connecting to the network;
2. Resilience against naive TLD-based DNS lookup attacks, again seeking to prevent members from initiating network sessions;
3. Flexibility in allowing members to choose their preferred exitnode clusters on a per-session basis;
4. Provisioning of alternative cluster-selection methodologies which enable selection stochasticity & thereby provide armouring against certain attack vectors targeting specific node hardware.
The Four Tiers of the cryptostorm HAF
Our HAF model is composed of four nested tiers of DNS hostname entries across a striped range of TLDs: instance, host, cluster, balancer. It is worth nothing that these mappings are not formally congruent with FQDNs and thus do not rDNS resolve uniquely (or, at least, not uniformly) - that is not their goal (we do have an internal methodology for managing rDNS/FQDN A Record mappings across TLDs, but it is of merely administrative relevance and thus unlikely to be of interest to the community). Indeed, the core approach we take to our four-tier model hinges on the labile nature of redundant hostname:IP mappings, within the global DNS system as instantiated in the wild.
(a parallel, but topologically orthogonal, mapping exists to connect physical machines to specific FQDN, rDNS-capable hostnames; in this whitepaper, as needed, we denote such mappings with the nomenclature of "machine" - that is to say that a {machine} is a specific physical server in a specific location; this, per above, is not directly relevant to the HAF itself except, per below, insofar as there is a small overlap between these two mapping domains in the naming of instances)
As is discussed at the end of this whitepaper, we make use of multiple, redundant, registrar-independent TLDs within the HAF to ensure that the overall architecture remains resilient against attack vectors targeting specific canonical domain names within this consideration space. For example, if an attacker simply DNS blocks all lookups to cstorm.pw, the aforementioned TLD redundancy will seamlessly 'fall back' to additional TLD-divergent canonical domains. This process is transparent to network members. However, because there are multiple TLDs in this rotation - and that TLD count grows over time as we add new redundancy moving forward - we designate below the form of TLD with a {canonical TLD} pseudocode placeholder. An example of a specific {canonical TLD} is cstorm.pw. In this sense, then, the designation {canonical TLD} is a superset of all currently-extant production TLDs in our rotation.
Each of the nested tiers below is a subset of the tier which follows, in formal (axomatic) set theoretic terms. Each set is a closed, bounded set. The sum total of subsets comprising a concomitant superset encompasses all items within that set; there are no 'orpan subsets' (or, at least, there shouldn't be if we've done our job properly!).
One: Instances
An instance is a specific server-side daemon (of the underlying openvpn application) running on a specific hardware-based server in a specific location, which in turn maps into requisite mongoDB shards to enable distributed authentication of network member sessions via SHA512'd token values. Instance are client-OS specific, as of December 2013; for example, an instance may be assigned a hostname mapping of the form:
Code: Select all
windows-{node}-{iteration number}.{canonical TLD}
Each individual {node} in our overall network infrastructure hosts multiple instances. These instances allow for customisation of configuration options for specific OS/ecosystem flavours, as well as increased security via "micro-clusters" of given instances on a given {node} for a given OS flavour. By keeping instances small, with respect to number of simultaneous connected network sessions, we retain the ability to more closely monitor aberrant instance behavior, spin-down instances for maintenance (after having load-balanced off all active member sessions; see below), and in general manage network capability more effectively in the face of ever-growing network traffic and member session counts.
Although we hesitate to point this out, each instance does in fact have an
uniquely-assigned public IP address. We hesitate, because we do not want
to suggest that members connect "directly to IPs" and thus bypass the HAF entirely. The downsides of doing so are: decreased member security, decreased session resilience, decreased administrative flexibility, and vastly increased fragility of session integrity over time. In short, IPs change - not quickly, but there is attrition & transience within the physical/public IP pool of our infrastructure. This is both inevitable, and acceptable - our infrastructure is not "locked in" to any host, colo, facility, infrastructure, or organization. Hard-coding IPs breaks this model entirely, and inevitably results in member frustration - or worse. It is strongly discouraged, wherever possible, in favour of adherence to the HAF as described herein.
A demi-step upwards in the hierarchical HAF mode brings us to the concept of "pooled instances." The form of pooled instance is as follows:
Code: Select all
windows-{node}.{canonical TLD}
Code: Select all
raw-cantus.cstorm.pw
Code: Select all
raw-cantus-1.cstorm.pw
In contrast, pooled instances (without the numerical identifier) will always resolve to a pool of the then-active instances on a given node. As such, it is acceptable to hard-code connections to specific pooled instances as there will always be an underlying specific instance - and likely more than one - to handle inbound connection requests. Of course, members could simply default to the first cardinal instance - "-1" - and assume there's always going to be a first of such... but no benefit is gained in doing this, as compared to simply using the pooled mappings, since in the corner-state where only one instance exists on a given node, the mappings devolve to identical functionality; however, when more than one instance exists on a given machine, hard-coding to the first cardinal risks having that specific instance be "down" occasionally for maintenance or other administrative needs.
In summary, instances represent the fundamental building block of the cryptostorm network. On a given physical machine - a "node" - multiple instances will exist, each supporting specific OS flavours. Additionally, aggregations of same-OS instances on a given node are defined as "pooled instances," and are the lowest level of recommended connectivity for network members to consider using in their own configuration deployments.
Two: Nodes
Nodes serve as the next layer of our HAF, above instances. Nodes are the logical equivalent to the machine layer, in the parallel rDNS model described above. Nodes are uniquely named and do not extinguish; however, they do "float" across physical hardware temporally (over time). For example, a given {node} may be named "betty" - betty.{canonical TLD} - but the underlying physical hardware (and thus, of course, public IP assignments) of "betty" will likely evolve, change, and otherwise vary over time. "Betty" is a logical - not physical - construct.
(we do not name physical machines, apart from node assignments; physical machines are fungible, and fundamentally ephemeral, within our model)
Node designations are something of a "shadow layer" within the HAF; members do not "connect directly" to nodes, and they exist in a logical sense as an organizational tool within the HAF to ensure it retains internal logical consistency. A node, in that sense, is merely a collection of instances - once all instances on a given physical machine have been fully enumerated, the resulting aggregation is, definitionally, a "node." Node mapping simply take the form of:
Code: Select all
{node}.{canonical TLD}
Code: Select all
{OS flavour}.{node}.{canonical TLD}
In summary, members do not connect directly to nodes. Nodes exist as an intermediate layer, between instances, and clusters. Nodes are composed of pooled instances, which themselves are aggregations of specific OS instances on a specific node.
Three: Clusters
It is at the organizational level of clusters that the HAF becomes directly relevant to those components visible within the cryptostorm configuration files. Clusters are the core unit of aggregation to support the most commonly-deployed network configurations, within our model.
A cluster is an aggregation of nodes in a given geographical location. When a cluster is first opened in a given geographical location, it is often the case that it is composed of only one physical machine; this allows us to test out member usage levels, ensure our colocation providers prove reliable and competent service levels, and scale physical hardware levels smoothly as needed. Nevertheless, we always refer to new clusters as "clusters," rather than as their underlying nodes. Careful readers of earlier sections of this essay will now surely understand why: nodes, for us, are more of an internal administrative designation and do not have direct relevance to member session connection parameters themselves, in the public sense.
The form of nomenclature for cluster mappings, available to network members for connections, is as follows:
Code: Select all
{OS flavour}-{geographic locale of cluster}.{canonical TLD}
Code: Select all
android-paris.cstorm.pw
Cluster hostnames are robust; that is to say, they will always resolve to a live, active instance for that specific OS within that specific geographic location. Note that cluster hostnames do not specify the underlying node - this is, as is we hope clear from earlier sections of this essay - both unnecessary, and would introduce needlessly brittle characteristics with no concomitant increase in security or functionality for network members. Recall that, for members who want to connect to a specific node, this can be accomplished via pooled instance mappings - and does require inclusion of the concept of cluster at all.
Naturally, there is no such thing as cluster-mapped hostname without a specific OS flavour being defined; since December 2013's "forking" of our server-side instantiations, all instances are OS specific.
In summary, clusters are the core of our HAF and are the layer of the model which is most directly relevant to customers seeing network sessions that terminate in a specific geographic location, for a specific OS flavour. They are robust, scale smoothly without any need for members to adjust their configuration parameters, and allow for failover/loadbalancing invisibly to members by way of standard administrative tools and practices on the part of the cryptostorm network admin team.
Four: Balancers
The final tier of the HAF - and the one most directly relevant to most network members, as it mediates the majority of network sessions - is the balancers. The cryptostorm balancers dynamically assign network sessions across geographic clusters, and are the optimal security selection for network members seeking maximal session obfuscation against the broadest class of threat vectors. Balancers deploy various forms of algorithmic logic to determine the cluster, host, and instance to which a newly-created network session will connect.
The content of the balancing algorithm itself is not hard-coded into the concept of balancers; rather, various forms of round-robin, load balancing, or formally stochastic session initiation may be implemented at the balancer layer, and new forms can be (and in fact are) added over time. Our team has been working in several additions to the balancer algorithm suite, and we look forward to rolling those out in upcoming months.
Currently, there are two balancer algorithm options: locked, and dynamic. Both are rudimentary round-robin techniques for mapping a given network session initiation request to a given geographic cluster. They vary in the method employed to provide round-robin functionality, and therefore in the "velocity" of change of mapped node selection for iterative network session re-initiation efforts.
First, we will consider "locked" balancer sessions. Locked sessions utilize the inbuilt round-robin A Record lookup functionality of the global DNS system itself. When a network session is initiated, a DNS query is generated to a table of A Records, which contains multiple possible public IP mappings. Once that lookup completes, the mapping of balancer hostname to a specific IP will remain durable as long as that lookup remains cached within the network member's local computing environment. Our default TTL settings, within the HAL, are set universally to 1337 (seconds), or just a bit over 20 minutes. However, of course, there are so many layers of cacheing found in most real-world DNS lookup scenarios that the functional durability of these DNS mappings on a given client machine is likely to be (in our real-world testing) closer to an hour or two.
So, for example, if a network member initiates a connection to the HAL address of:
Code: Select all
windows-balancer-locked.cstorm.pw
In constrast, "dynamic" balancer sessions are mediated by the round-robin logic built into the OpenVPN framework's "--remote-random" directive in current compiles of the underlying source. This directive causes the network session to choose, from a sequential list of alternative remote parameters, "randomly" for each and every newly-initated network session (we say "randomly," in scare quotes, because the selection is not formally random but can be better thought of as quasi-stochastic). Thus, if a network session drops or is cancelled, the newly-instantiated session will go through a new "random" lookup and will, with reasonably high probability (greater than 70% currently, and rising), connect to an entirely different cluster. This is why we call this "dynamic" sessioning: each session, when instantiated by a member, is likely to result in a different cluster being selected "randomly" from all in-production clusters. This will, on average, result in a higher velocity of change of session cluster mappings than the locked balancer will.
Future balancer logics take these baseline quasi-stochastic methods and extend them into more formally "random" (pseudo-random) frameworks, as well as into "best performance" and "closest pingtime" approaches to session instantiation. We have already mapped out these three additional balancer algorithms internally, although they are not yet ready for in-production testing. We are, however, quite optimistic about the open-ended nature of the balancer logic itself: in the future, we expect that member- and community-created balancer algorithms will be added to the network, with logics outside of our currently-assumed consideration sit of possible options. Creativity, in that sense, is the only constraint on the HAL balancer framework itself.
In summary, the balancer layer of the HAL is most relevant to the majority of network members and sessions, and embodies an extensible, open-ended ability to add new logics & new algorithms in the future. Currently, we support locked & dynamic balancer methodologies, across all production clusters.
Summary: The Evolution of HAF
We do not hesitate to acknowledge that the cryptostorm Hostname Assignment Framework itself is a work in progress. It has evolved, and in some senses exhibited emergent properties in its real-world application, as network members & network activity overall have continued to increase in a steady progression. If we were to claim that, prior to the network's fully deployment in 2013, we had planned all this out in advance, it would simply not be true.
That said, the direction in which we have guided the HAF is towards a flexible, extensible model that minimises the need for members to fuss with the HAF, understand the workings of the HAF, or be inconvenienced in any way as the HAF itself continues to develop and mature. In a purely metaphorical sense, we have sought to to channel some of the ideas of object oriented systems design as guiding principles for the HAF: decoupling subsumed layers & the details thereof from higher-order "objects"/layers, in a nested hierarchy or - equally validly in ontological terms - a holarchy.
Our future systems architecture roadmap envisions the HAF as a core element of a mesh-based network topology that both fully embraces stochastic routing methodologies, and leverages per-stream/per-protocol independence in within-network routing path selection. In other words, rather than having a defined "exitnode"/instance for all packets sent & received by each network session, members will have the flexibility to route selected packet streams - say, a video available only with a US geolocated IP address - via one route, whereas other packet streams for the same session can be directed to other exitnodes/clusters/instances as preferred. Or, for maximal security, streams and packets can be stochastically routed through the overall topological mesh of cryptostorm's entire network, egressing in many geographic locations and via many public IPs. There is no need for one network session to send & receive all packets through one exitnode, in other words; the HAF is our foundation for enabling that future functionality for the network overall.
For now, the HAF serves the purpose of providing robust connectivity, high security, fine-grained administrative capabilities, and minimal hassle to network members in order to achieve maximal security & capability whilst on-network. We should make note that, with the rapid growth of our network through the winter months, our tech admin team is still catching up some of the new hostname mappings within the framework defined above. Too, we're in process of migrating our entire DNS resolution/lookup capability, network-wide, to a more robust & scaleable backend infrastructure. Together, these two steps are not something we've chosen to rush - they must, essentially, be done properly & tested fully prior to cut-over, and our priority is to ensure that process is fully transparent to network members. Planning and implementing towards this goal takes time, and a bit of patience. So, if network members notice that some hostname assignments which appear to be implied by the HAF don't currently resolve... we know. We're working on it.

Finally, a note on the selection of TLDs for use in the HAF. As is well-known within security research circles, some TLDs - we're talking about you, .com - are controlled entirely by specific governmental bodies that have very selfish reasons to exert undue influence over those registries.
We consider any domains located in such TLDs to be at best ephemeral, and at worst subject to arbitrary takeover by hostile governmental entities with little or no due process or notice. Of course, we generally avoid such TLDs as they serve little purpose in our security model. However, rather than searching for the "perfect TLD" that is entirely free of efforts to censor and subvert free speech (which is pointless, in any case, as specific nations can poison TLD lookups within their own localised DNS sub-frameworks, at will), we choose instead to stripe our needs across a broad range of TLDs, continuously adding new ones and, over time, pruning those which prove less than useful.
These are the TLDs that are seen by those who take a look inside our "--remote" directives: they are there to ensure systems continuity & network resilience in the face of denial of service or outright censorship-based attacks on cryptostorm. They are not actually part of any balancer capability, nor do they serve a purpose beyond simple fail-over protection against loss of any one domain within a given TLD. These TLDs are, in a sense, disposable - which is not to say they are not security relevant, not at all! An attacker subverting a specific TLDs DNS lookups can redirect new cryptostorm network sessions towards hostile server resources, and we have systems in place to notify us of any such hostile actions at the TLD level (further, of course, our PKI and cryptographic framework protects directly against efforts to "false flag" exitnodes via public key-based certificate validation of server-side resources).
It is our hope that this short essay introducing the cryptostorm HAF has been useful for network & wider community members alike. More than that, we are eager to receive feedback and suggestions as to how it can be improved, expanded, or otherwise modified to best meet the needs of our members worldwide. It is not perfect - far from it - and we expect the HAF to continue to evolve as time goes by.
Finally, we will continue to bring into line our versioned configuration files with the full logical implications of the HAF framework, a process we hope to complete in upcoming revisions of the configs. That will not result in breakage of backwards-compatability for older versions, but it will mean that the full suite of capabilities implied by the current (1.1 rev) version of the HAF is only fully available to those using fully-current config files (or widget installs) for their network session management. That will be indicated, once again, during 'pushes' of new config files and we trust that network members will help their peers within the community to continue to upgrade their connection profiles so as to gain maximal functionality & security hardening from our ongoing improvements in the HAF.
Our thanks, as a team, for your support & assistance. We look forward to many more enhancements to the HAF - and the network itself - as time continues to pass us by.
Respectfully,
~ cryptostorm_team