Saturday, October 4, 2008

Review of SpiderOak online backup and storage and open source

SpiderOak has (and will have) some unique features that other solutions I have used like Mozy and Carbonite does not have.

I think SpiderOak works pretty well and is easy to use. I’ve had a few issues and some of them will most probably be fixed within a few months. With the combined functionality they call it a ‘SuperCloud’.

Here are some details about SpiderOak:
With some comments on my questions by SpiderOak CEO Ethan R. Oberman.

  • File names and folder names are encrypted as well as the actual content of the files. Some systems transfer the file and folder names as clear text or that information is saved in clear text on their systems and you might not want that. SpiderOak employees don’t know anything about what you have backed up on their servers.
  • Windows, Mac and/or Linux computers are supported and with cross-platform management and access.
  • External and network drives are supported.
  • Versioning – multiple versions of files are stored and easily accessible. Great when you make changes to files a lot.
  • Older versions or deleted documents are never discarded.
  • Create password-protected 'ShareRooms' for your friends, family members or co-workers. RSS feeds available to keep up-to-date with changes.
  • No limitation on number of computers to install on. It is just how much you backup that counts.
  • Data de-duplication speed up subsequent backups so just changes are uploaded. In their competitive chart they list Mozy and Carbonite to now have this but I know both of them have some kind of functionality so that not everything is backuped each time. Not sure how correct that is or what the definition for it is. Mozy has for example “Block-level incremental backup”.
    Here is the comment on this by SpiderOak:
    • Yes - they both do some basic de-duplication. For example, when one file is changed they can detect which parts changed from this version to the previous version. That has been a standard feature of back up software for probably 15 years so it would be really quite amazing if they omitted that functionality. When we talk about major de-duplication, we mean across a user's entire SpiderOak network - including all files from all computers. For example, if you have a document on one machine that was 50% the same as a document you had already backed up with SpiderOak on another machine. SpiderOak will only back up and store the 50% difference between the files which means the back up is faster the total amount of storage is less - by 25% in this case. 
  • The competitive chart also mentions Information on Backup Progress but both the other products has some kind of progress indication.
    Here is the comment on this by SpiderOak:
    • Yes - after further review we agree that they do show some information on the back up process - although we don't feel as informative as SpiderOak's 'Status' tab. We will give them a partial bubble for this feature.
  • For geeks there is a command line interface.
  • How is the data stored on SpiderOak? Does SpiderOak have data on multiple locations? Say one location goes down totally, can I still access my data?
    Here is the comment on this by SpiderOak:
    • The data stored in SpiderOak data centers is done so in an internally redundant format with several points of replication. Further, the data is also mirrored at different locations such that even if one data center where to go offline for any reason, users would still have access to their data.
      We do offer real-time geographic redundancy but the cost is significantly increased due to the additional overhead involved.
  • How is SpiderOak’s Remote Data Access unique compared to others?
    Here is the comment on this by SpiderOak:
    • As it relates to Mozy and Carbonite, they don't offer the ability to access the stored data remotely. Through our website at the 'My Login' prompt, a user can easily access and download any file from across their SpiderOak network anytime. Some other back up services do offer this service but they are not as publically done.

You might not think that the program is Windows-like in look and functionality. The reason for this is according to SpiderOak  “given that one of our top priorities from the beginning was to be completely cross platform, we wanted to provide a consistent look and feel within the application regardless of platform. Therefore, if you are on a Mac, a Windows machine, or working in Linux, the SpiderOak application will look and feel the exact same”. I could live with that.

Things I miss:

  • It’s not possible to cancel a download.
    Here is the comment on this by SpiderOak:
    • We agree that this should possible and have it on our list for our SpiderOak 2.2 release. Additionally we plan on improving the information SpiderOak provides to the user through more intuitive progress bars. Another area we will focus on is greatly improving the performance of downloads in general. Right now if you download a folder that has more than roughly 10,000 files contained within it the process can be slow. For folders with a smaller amount there is no slow-down noted. In general, 2.0 changed our process architecture (multiprocess) which has about 1/4th the overhead of using the previous approach of integrating the GUI event loop with the worker event loop into a single process. All this results in a major speedup and provides grounds for a lot of important future changes.
    • In 2.1 and 2.2, which will be released much more quickly than the 2.0 release, we will add a number of new capabilities and speed improvements such as the ones mentioned above.
  • Ability to set how much of the bandwidth to use when backing up.
    Here is the comment on this by SpiderOak:
    • We will be launching the ability to curb the bandwidth SpiderOak uses in our 2.1 release which will include the Preferences section. Preferences will also allow you to define how long to retain historical versions, deleted files, how SpiderOak should behave at Startup, and several other useful functions.

Some things that are not so good:

  • Support for backup of open Outlook PST files is not that good so I can use it for this purpose. It sometimes hangs when building the backup and uses too much CPU.
    Here is the comment on this by SpiderOak:
    • This is a known problem and we are working on addressing it. To give some background, I need to explain how SpiderOak de-duplicates data. To make our pricing structure more competitive, SpiderOak does intense compression and de-duplication within a backup set (even across multiple computers). Whenever new data is stored, SpiderOak tries hard to store it in terms of references to existing data blocks. Therefore, SpiderOak is able to retain all previous versions of files using very little extra space. Thus, any duplicated data between any number of computers you backup with SpiderOak will never take up any extra space.  We have customers who have stored hundreds of gigabytes of user data within a 100 GB SpiderOak account.
    • There are a few specific types of files (such as those that have very sparse changes) that SpiderOak takes a long time to calculate the differences between the old version an the new version.  Occasionally outlook produces such files. We'll be greatly improving the performance of this in one of the next SpiderOak versions. 
    • In the 2.0 version, SpiderOak will notice if a file is likely to take a long time to build the additional (3rd) version of that file and use a different storage strategy. This is the best approach we have for the moment. We know this can take a very long time in some rare circumstances and fixing it is one of our top priorities.
    • Another possibility is that Outlook is changing the file underneath SpiderOak (i.e. as we build, the file changes before we finish). In this situation, SpiderOak will try up to 3 times to get consistently build the file and then quarantine that file for sometime before trying later.  Usually this strategy will work for eventually getting a consistent build.  It will also work much better once the builds happen faster - as described above.
  • I find the application to use a lot of memory and CPU at times. There should be a way to set the CPU priority lower. After the initial backup the memory usage is more normal.
    Comment by SpiderOak support:
    • After more investigation, we've found that this is pretty typical for shortly after startup, but soon it reduces to a more reasonable level.  Seems to be something involving the QT libraries.  We'll work on improving that in 2.0 and 2.1.

Release the 2.0 version of SpiderOak is coming soon and hopefully improves this great product even more.

Sync will be included in SpiderOak 2.2 and is scheduled later this year.

See also SpiderOak’s blog on the problems to find investors because of open source.

“In seeking funding for SpiderOak, many of the investors we interacted with had a pervasive hostility to open source. Their attitude might be summerized as: "Your acquirers will mostly be interested in you for your software, to combine with their existing sales channels. You will have destroyed the value of your company. You do this at your own peril." As we continued to display OSS enthusiam, what investment interest they might have had visually disappated. Even more mild open source scenarios, such as an open source commerical license, were unpalletable (and sometimes incomprehensible.)”

Everyone is offered a free 2 GB account. Price is $10 dollars for the first 100 gigabytes and $10 for each addition 100 gigabyte increment. One open question here is if it is the actual space on SpiderOak’s servers that counts. (The answer is that in worst case 100 versions of a file will take a bit from your account depending on how much difference it is between each version)

PS and disclaimer. I did get a free 20 GB account to be able to test SpiderOak.


Some of the latest blog posts

Subscribe to RSS headline updates from:
Powered by FeedBurner