With the background information on hypervisors (virtual machines, etc.) provided in part one (PSM October), it should be fairly clear that the cloud computing available to you today is an implementation of virtualization with public access. For example, Amazon Web Services are implemented using Xen (an open-source hypervisor), as is Rackspace. Microsoft Azure cloud services are implemented using a highly customized version of Hyper-V. Dell Cloud is using VMware. Thus, except for some purpose-built, limited-function solutions, “rentable” cloud services are hypervisors configured with various services to allow self-service access.
Why Image/Lidar Processing in the Cloud Is Not Practical
Let’s look at an example of processing Hexagon Geospatial Solutions DMC II imagery using Amazon Web Services. The DMC II has the following pertinent parameters for this model (we will assume we are processing 4-band, 16-bit imagery and not using compression):
- Input data = 0.8 GB
- Intermediate data = 1.2 GB
- Output data = 1.7 GB
- Processing time on a large EC2 (elastic compute cloud) instance = 3 minutes/image
Amazon provides 850 GB of local storage for its large EC2 instance. However, this is available only during execution of the instance. For this simple model, we will assume that we start an EC2 instance, upload source data, process, and download results. Note that we are skipping a lot of detail here such as how to partition a real-world project among multiple EC2s (you would have to write custom code).
If we assume static storage use (that is, we are not dynamically deleting images as we process), we will need 3.7 GB per image of storage space. Thus, we could process about 229 source images in a single, static, large EC2 session. If we assume three minutes per image, we can process 20 images per hour in a single EC2 instance. Thus, we see processor speed will be the limitation.
Assume that uploading imagery is free (although I am sure Amazon never contemplated DMC II processing when they made this rule!) but downloading is 12 cents per GB. Twenty product images will be 20 x 1.7 GB or approximately 34 GB. Thus the download of 20 images will cost 34 GB x 0.12/GB = $4.08 (you see right away where the big expense is going to be).
We will assume that our processing code is available in Linux because this provides the more economical model for EC2 rental. Now, don’t go out demanding that Hexagon port their DMC post-processing code to Linux before you hear the end of this story. A large EC2 instance is 32 cents per hour.
This yields, per 20 images:
$0.00 for data upload
0.32 for data processing
4.08 for data download
4.40 per 20 images
= 0.22 per image
= 220.00 per 1,000 images
The big issue is getting the raw image data up to AWS and the processed images back. Let’s assume that we could sustain a transfer rate of 10 megabytes per second in upload and download speed (which would be quite a respectable transfer rate for most corporate infrastructure). The transfer time for our 20-image-per-hour model is then:
2.5 GB/image x 1 sec/0.01 GB x 20 images = 5,000 secs = 1.39 hours
This is the crippling factor. We cannot transfer data in and out of the EC2 instance as fast as we can process the data. What puts the final ax to the entire concept is that while I/O bandwidth is aggregated on the Amazon side, there is a limit on your corporate side. Thus, while you may be able to achieve a factor of five better than this number, to go much beyond that would be prohibitively expensive.
Why Use a Public Cloud?
To answer this question, you have to look at the three general implementations (or service models) individually:
- Software as a Service (SaaS)
- Platform as a Service (PaaS)
- Infrastructure as a Service (IaaS)
As I discussed earlier, IaaS/PaaS for base map production involving lidar and/or imagery is not cost effective. Even if you were able to cost-justify a public cloud service, it simply would not be practical due to the bandwidth limitations of getting data on and off the service.
Possible reasons to use a public cloud service include:
- Improved security: If you have non-trusted users who need to access information that you will maintain (for example, your website), then hosting that data on a public cloud can make a lot of sense. While you will still need to maintain security for the data you host, you will not need to be as concerned with this portal becoming a place where malicious visitors can breech your general IT infrastructure.
- Improved reliability: Reputable cloud service providers will have considerably better reliability than you can achieve with your own internal systems.
- Reliable backups: It is very difficult for your IT department (if you are large enough to have one) to reliably back up your stationary systems, much less portable systems and systems of remote workers.
- For sharing information with remote and transient workers: Again, a big consideration here is improving the security of your main systems by using a public cloud service rather than providing remote access to internal systems.
- For when you need software on a very occasional basis.
- For simple file sharing of small files with entities outside of your own company.
- For when you need bursts of high performance computing but have relatively low data traffic to/from the processing cluster (I have a tough time coming up with these scenarios for base processing geospatial companies).
Why Not Use a Public Cloud?
The most prevalent use today of cloud services is to rent software in an internet-hosted fashion. I find nearly all software implemented as a web service to be seriously behind desktop-hosted software in both the quality of the user interface and the response time of interactions.
An example is salesforce.com. The user interface (this is my opinion here; you may not agree) is very primitive in terms of intelligent form fill-in features, the types of widgets available for accessing features, and so forth. The time from filling in a form (for example, a new contact) to moving to a new form is nothing short of painful.
For the above reasons, it is very difficult for me to imagine using a web-hosted application in a production scenario. Thus, the dream of having web-hosted editing software for applications such as lidar editing, image QC, etc. still seems far off (but not completely beyond imagining).
A second consideration is cost. SaaS for higher-end applications such as sales-force automation is not inexpensive. You have to be very careful to compare the total cost of deployment rather than simply the software cost. Consider a ten-user deployment of salesforce.com’s lowest configuration for ten users (Group Edition). At $65 per user per month, this is $7,800 per year. A ten-user bundle of Goldmine would cost $6,000 in initial software purchase and $1,400 per year in software maintenance. However, Salesforce is a turnkey, hosted solution. Deploying Goldmine will require provisioning a server, remote access, and all of the myriad details of self-hosting a remote application.
Remotely hosted cloud computing for high throughput image/lidar processing is not practical, both for cost and bandwidth reasons. Software as a Service offers an opportunity to move some of your aggravating applications off your servers and in to a “zero admin workstation” environment. However, be mindful of the cost. Those $59-per-workstation-per-year credit card charges add up to big numbers in a hurry!
As I mentioned in part one, keep an eye on what I call System as a Service (my own term). That’s where software, platform, and/or infrastructure are offered all together. When System as a Service works well, it forms a seamless integration of a purpose-built hardware device with a back-end service that provides more than storage and computing.
An example from the computing industry is how Apple offers “i” devices that work with iTunes and iCloud. For us, an example is Trimble’s Gatewing unmanned aerial system and cloud-based image-processing solution. This sort of hardware/software deployment will become increasingly important in our industry.