Saturday, January 14, 2012

I'm so tired... of disks

I'm pretty sure when Lennon and McCartney wrote "I'm so tired" they were talking about disks, those things we store data on in computers. I agree with them, I am so tired of disks.

  • My first issue with them is I don't have disks that are larger than a terabyte. So a lot of my personal and work related archives are spread all over. I don't want to buy disks, that's what normal people do, I wait for second hand disks when people get ride of their computers and then make use of them. I have 3 personal USB drives, one machine with two drives in it with about 100GB each. Managing the files across each and rsyncing is getting too hard to remember what is where. 
  • My second issue is my work machine is around 300GB in size. For work, there are a lot of large files. Data feeds, databases, virtual machines, huge source code repositories, etc. I can't keep it all on one drive, so I have to use many external drives, usually around 500GB to move data on an off of. 
  • My third issue is dealing with disk inside of virtual machine. Not only am I limited on disk space on my local machine, on the virtual machine I need to manage disk space. I have to clean off data, resize partitions (which increases the image on the host machine). 
  • My fourth issue is waiting on file transfers. I don't even know what a USB transfer rate is, but it sucks when you are moving 70GB of data. Oh, and it's awesome transferring from USB drive to USB drive. I hate network transfers too, waiting to upload something to S3 or even via torrent is slow (unless there are a lot of people involved which isn't usually my case, this share feature is cool though). 
  • My fifth issue is compression computation time. It takes a long time to tar+gzip a 70GB file. Then the compressed file and the original file take up space. I am usually tar+gzipping to archive something large to roll back to. So I need to keep both for at least a little while. 

I know there are answers to all these problems, the problem is I am cheap and lazy. I don't want to pay for faster disks like SSD. I don't want a faster internet connection, remember, I am cheap. Hadoop has a distributed copy, that's sweet, but I don't have a distributed file system with my virtual machine image on it to distcp to another distributed file system (at least that is how I think it works).

My last big idea was to use the empty air space around us everywhere to store data. For example, you can store data in the space around you in a room. But, I was laughed out of the bar when I threw that out, and IBM is shrinking the bit size, so I can wait for that.

