Sunday, December 16, 2007

LastData – Data Storage for the LastComputer

The second major technical aspect of the LastComputer, after applications (which has been the major focus of this blog so far), is data storage. When using the LastComputer your data must just be there, available, all the time. It must be there when you turn on your LastComputer, it must be there after your LastComputer crashes and is replaced, it must be there whether you’re on your big LastComputer desktop or laptop or whether you’re using your LastComputer mobile device. And most of your data must be there when you’re using a public terminal or using your friend’s computer (or, for ALL of your data, your friend’s LastComputer) at their home.

To satisfy these never-lost/always-accessible needs, and yet run as quickly as a traditional personal computer, requires a combination of automatic data caching, replication, and backup. Most likely this means that the master location of your data will be off in the cloud (i.e., in server data farms).

Is it too much to expect everyone’s data to be stored on a server farm? Isn’t that too much data and too much bandwidth? I don’t think so. Through deduplication techniques it turns out that most of your data really isn’t very big in terms of extra storage required: you run the same programs as millions of others, own the same songs as millions of others, share business documents with many others, and so are left with only your personal files. Personal files do not require much storage space, except for personal photographs and movies. (I joke that the world only needs one picture of a baby with a messy face full of food, with all variations handled by diff, but most parents and grandparents demand the real thing.) So storage will be fairly cheap per-user, and probably can fit within the LastComputer’s pricing model, with a little extra payment required for the photo/video-happy users (e.g. new parents of the world).

As for the bandwidth problems, in most cases a person’s LastComputer will have cached what they need, and changes are relatively few, and so the bandwidth costs remain reasonable. To the typical LastComputer user, they’re simply using local copies of their documents, reading their email, and listening to their songs, not thinking about the fact that much of this data which appears to be on their computer is really a cache of the real data file off in some server farm.

No one has achieved this LastData model yet, at least not for the public, but here is a representative sampling of some products and categories that are partway there:

Personal access-your-data-from-anywhere home servers:

This class of product (e.g. HP MediaSmart Server and other Windows Home Server boxes, and other server boxes) helps with backup, some amount of remote access, the ability to see files from many computers, and recovery of files and systems. Good goals, but it’s so wrong for these reasons:

  • Someone in every home becomes the IT department.
  • Must be addressable from anywhere in the world, and so IP providers must be cooperative.
  • Must be always on. Future solutions must save energy, not consume more energy.
  • This one machine now becomes the point of failure for all your computers – so who backs up the backup machine?
Online backup

There are many great online backup service to choose from (e.g., ibackup, .mac, backup.com, xDrive, Mozy, Arsenal, Carbonite, Amerivault) and the space is hot now (e.g., EMC recently acquired Mozy for $76 million, while IBM acquired Arsenal Digital just days ago, although Omnidrive is looking troubled). Many of these services offer a few GB of backup for free, but keeping a backup of your entire system will likely require more space than that. For $50 to $100 per year these services can backup your entire system (less per-system costs are available for corporate plans).

I personally use Mozy and am so far mostly happy with the way it works, but have not had a catastrophic failure since using it, so don’t know how well it will pull through when I really need it.

These online backup systems use a subscription model of approximately $6/month (more or less). This fits in well as a component of the $30/month subscription/leasing price I previously proposed for the LastComputer, which would contain built-in and intrinsic automatic backup (or caching, if you prefer) as part of that $30/month cost. (With deduplication, the actual storage costs per-computer will come way down.)

Where the current online backup solutions fall short is in their integration with the entire computer-owning experience:
  • Online backup is currently a separate experience from computer purchasing and maintenance.
  • Too-frequent user intervention is required.
  • Data recovery is not trivial.
  • If I have failure in all or parts of my system, it’s not clear that all programs will be configured correctly when I restore.
  • If I need to restore everything to a new computer, I’m not confident that all of the programs I own (have paid for and configured) will be able to be restored to their state automatically from backup, without my needing to go through some painful re-installations.
  • If I am on a different system (e.g. my mobile device or a public terminal) most of them don’t provide a clean way to access my data.
  • The big problem: We should not even need to think about “data backup”. What on my computer should just be “my stuff” and should always exist without my thinking about it.
Synchronization / Replication

One may view the LastComputer data issue as largely a file-synchronization problem. Whichever LastComputer you’re using (primary, mobile, friends’, hotel, or post-crash replacement) you want your data to be replicated on that computer as you need it. Folder-synchronization tools (e.g., FolderShare, PowerFolder, Unison, and good ‘ole rsync) show one way this may work. These tools allow you to have the same set of files on multiple computers; if a file is changed on one computer it automatically gets updated on the other computers.

I personally use FolderShare to keep certain files consistent across many machines (e.g. photos, files with information for the whole family, or common command files I need on each of my machines).

These file synchronization programs are not, in their current form, exactly what the LastComputer users need, but they do validate the utility and performance of cache-like, update as-needed data management. And they can be used to personally recreate some parts of the LastComputer on your own. For example, two friends may want to use file synchronization to constantly keep each other’s computers backed up. Or you may want to keep certain parts of your home and office computer always in-sync. Or your team across the globe may want to have local copies of commonly used documents without the performance loss of around-the-world communications each time they refer to a file.

I maintain some fading hope that Microsoft will surprise us all with a really nice worldwide data replication system, if Ray Ozzie has one more big replication trick up his sleeve.

Web OS & Online Apps:

Many expect that thin clients (browser, flash, silverlight, java) will replace desktop applications entirely. I’ve written previously (1, 2) that I expect this to happen for some common and simple-UI applications, but not all. If I’m wrong, then LastData becomes a non-issue because all of your data will always be on the server. Here are many companies and products betting that I’m wrong: Google Docs, Zoho, Zimbra, ThinkFree, Gnome Online Desktop, eyeOS, Ulteo (and so on, see Read/Write office roundup).

Offline/Online WebApps

A simple extension of the Web OS/SOA idea is applications that run in the browser, but continue to run when disconnected from the Web. Zoho has most recently shown how well this works with their “Go Offline” option which uses Google Gears to cache your data locally. I wrote earlier about Google Gears, and Zoho is the first to show it working. It’s not a great system, because it requires manual intervention, but it does work enough to prove the point.

I keep expecting Adobe to move ahead in this area with their AIR Applications, which is intended to let developers use their web (HTML, PDF, Flash) skills to develop desktop applications. It’s only a minor shift from there to on-demand applications and data anywhere (see Adobe discussion).

The disk drive in the cloud

Imagine a drive or folder on your computer—oh, let's just call it “G:” or “/G” for example—that also appears to be on every other computer you use. All of the G data (which will be all of your data) is really off in some server-farm somewhere (you don’t need to know where), but to you as a user it just appears to be in your local G, on any computer you use. The data in G is accessible by both your online apps and your offline apps. You then store and retrieve all of your data (documents, photos, movies, music, high-score game settings, etc…) on G and only G. No extra steps or copying or backup will be necessary. This will be data nirvana.

To achieve adequate performance on this G folder, tweaks will be needed to the current OSes to predicatively cache and search and replicate, but those are just tweaks and not far off from what is built into OS X or Vista already.

I hope that’s what Google has in mind for their GDrive.