hardware replacement (hard drive) in process: turing.cryptostorm.net

Looking for a bit more than customer support, and want to learn more about what cryptostorm is , what we've been announcing lately, and how the cryptostorm network makes the magic? This is a great place to start, so make yourself at home!
User avatar
Site Admin
Posts: 495
Joined: Thu Jan 01, 1970 5:00 am

hardware replacement (hard drive) in process: turing.cryptostorm.net

Post by df » Tue Mar 24, 2015 8:15 pm

Just a heads up to everyone, turing is have physical hard drive issues at the moment:

Code: Select all

ata1.01: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x0 frozen
ata1.01: failed command READ DMA EXT
ata1.01: cmd 25/00:20:40:11:94/00:00:30:00:00/f0 tag 0 dma 16384 in
res 40/00:01:00:00:00/00:00:00:00:00/10 Emask 0x4 (timeout)
ata1.01: status { DRDY }
blk_update_request: I/O error, dev sda, sector 815010112
We're waiting for oneprovider.com (turing's ISP) to replace the hard drive.


Turing physical forensics

Post by cryptostorm_admin » Tue Mar 24, 2015 11:05 pm

We've been seeing transient instability in Turing since late last week. Process drops, heavy congestion at softIRQ.

Our first concern is to ensure there's not an OS compromise despite our grsec-hardened node kernels. To support that, all nodes redundantly archive encrypted copies of kernel error logs to another node, so an attacker cannot simply edit local logs as root after a successful exploit.

Although nothing we saw in those error logs suggested an exploit, we decided to do a from-the-metal OS reinstall on Sunday morning, to be sure we'd wiped anything from the BIOS up. That completed, and the box ran well until it didn't.

Odd memory management issues were not showing patterns that suggested OS-level fixes were possible, like these:
Further analysis by df (above) confirms hardware issues. We're currently working with the DC to either replace the HD in question, or the box entire.

Of course, this emphasises the need for redundant nodes in clusters (to be a cluster, really). When a hardware issue appeared in our US-West cluster recently, balancing with other nodes retained member access with little if any performance impact and no loss of security levels.

This moves to high status in our priority queue, ensuring redundant nodes in all clusters.

Thank you,

cryptostorm admin

User avatar
Site Admin
Posts: 495
Joined: Thu Jan 01, 1970 5:00 am

Re: hardware replacement (hard drive) in process: turing.cryptostorm.net

Post by df » Fri Mar 27, 2015 8:57 am

After an epic battle getting this thing back online, it finally is.
This was one of those situations where everything that could go wrong, did.
Started with the initial physical hard drive problems, then when that was replaced the network cable somehow got unplugged.
After that got plugged back in, somewhere along the line the cable that provides video to our KVM became loose.
When the server finally was back online, there were all kinds of problems compiling our custom kernel (something to do with the kernel version the box came with, a few missing modules were to blame).

All is well now. I just did a test win connect with the widget and another test connect on my ubuntu laptop, both seem functional (and .onion/.i2p access works on them). Accessing the .bit TLD provided by DNSChain won't be available for another hour or two though. Starting that server requires running the namecoind server, which has to calculate a block count before it'll work (it's gotta reach the block count listed in http://explorer.dot-bit.org/stats/block_count.txt ). As of this post, the status on that can be seen with this tiny one-liner I hacked up:

[namecoin@turing ~]$ t=`wget -qO- http://explorer.dot-bit.org/stats/block ... `namecoind getinfo|grep blocks|awk '{print $NF}'|awk -F, '{print $1}'`;echo "$y blocks so far, $(expr $t - $y) to go til you reach $t"
99637 blocks so far, 124287 to go til you reach 223924

Also, we went ahead and converted all the daemons/servers running on Turing from the old/default services management through init scripts to the much better daemontools programs ( http://cr.yp.to/daemontools.html ). Eventually we'll convert all the other nodes from init to daemontools.