The rapid growth of edge computing has led to a surge in data generation and collection at unprecedented levels. Temporary installations, such as scientific research stations, surveillance systems, and industrial facilities, often require rapid data collection and transfer for smooth operations. However, the high cost of hardware, coupled with the need for reliable and efficient data storage, can pose significant challenges for these projects. Amazon AWS Storage Optimized Snowball, combined with custom virtual machines, presents a game-changing solution to this problem.
The rapid growth of edge computing has led to a surge in data generation and collection at unprecedented levels. Temporary installations, such as scientific research stations, surveillance systems, and industrial facilities, often require rapid data collection and transfer for smooth operations. However, the high cost of hardware, coupled with the need for reliable and efficient data storage, can pose significant challenges for these projects. Amazon AWS Storage Optimized Snowball, combined with custom virtual machines, presents a game-changing solution to this problem.
AWS Snowball Edge is available in two core device types. Snowball Edge Compute Optimized, with more computing (vCPUs, DRAM) and GPU capabilities, suited for higher performance workloads, and Snowball Edge Storage Optimized with more storage, which is suited for large-scale data migrations and capacity-oriented workloads. Our initial requirements when ordering Snowballs were a perfect fit for the Snowball Edge Storage Optimized solution.
While exploring the use of Snowball to move our 100 trillion Pi computation to the cloud, we ended up slightly over-ordering. We ordered twin 80TB Snowballs tuned for data migration and only needed one. So with the second, we wanted to see if we could get an EC2 instance operational in a remote setting. While this would be an easy setup option when configuring the Snowball ahead of shipment so the customer receives an appliance with EC2 ready to roll, it’s a little trickier, though not impossible, to reconfigure in the field after the fact.
Heads up, this article will get into the nitty gritty of configuring a VM and sideloading it onto the Snowball. If you want to skip to that section, click here.
Background and Overview of Amazon AWS Storage Optimized Snowball
Amazon AWS Storage Optimized Snowball is a rugged, portable, and secure data transfer solution designed to simplify and accelerate the process of moving large volumes of data to and from the AWS Cloud. This purpose-built device is specifically designed for use cases requiring high-speed data transfers and short-term edge storage, making it ideal for temporary installations or locations with limited or no network connectivity.
Equipped with advanced storage capabilities, encryption, and tamper-resistant features, the Storage Optimized Snowball ensures secure and efficient data migration while significantly reducing data transfer costs compared to traditional methods. By leveraging this innovative appliance, organizations can overcome the challenges of data collection and storage in edge environments, paving the way for seamless data integration and analysis in the cloud.
Storage Optimized Snowball boasts several key features that make it a powerful solution for data transfer and storage:
- High-capacity storage: With storage capacities of up to 80 TB, Storage Optimized Snowball can easily handle large-scale data migration tasks, catering to various use cases and data-intensive applications.
- Fast data transfer: Equipped with high-speed 40 Gbps network connections, Snowball enables rapid and efficient data transfers, reducing the time needed for data migration.
- Data security: Snowball uses industry-standard encryption protocols (such as 256-bit AES) to protect data both in transit and at rest, ensuring the confidentiality and integrity of your data throughout the migration process.
- Rugged design: Built to withstand harsh environments, the Storage Optimized Snowball features a rugged and weather-resistant design, making it suitable for use in a wide range of conditions and temporary installations.
- Edge computing capabilities: Snowball’s built-in compute capabilities allow users to run edge computing workloads and process data directly on the device, reducing latency and enabling real-time analysis.
- AWS Greengrass integration: Snowball comes pre-installed with AWS Greengrass, allowing seamless integration with AWS Lambda and other AWS services, enabling edge processing and analytics.
- Easy deployment and management: With its intuitive and user-friendly interface, Storage Optimized Snowball simplifies the process of device setup, data transfer, and tracking, streamlining data migration tasks for organizations of all sizes.
Amazon AWS Storage Optimized Snowball offers significant cost-saving and efficiency benefits compared to traditional data transfer methods. By utilizing Snowball’s high-capacity storage and fast data transfer capabilities, organizations can dramatically reduce the time and bandwidth required for data migration, resulting in substantial savings in both time and resources.
Furthermore, Snowball’s rugged design and edge computing features eliminate the need for additional hardware investments and on-site infrastructure, further reducing costs for temporary installations or edge projects. Additionally, the seamless integration with AWS services enables streamlined data management and analysis, enhancing overall productivity and operational efficiency.
And as previously mentioned, we ordered two of the AWS Snowball Edge Storage Optimized devices but Amazon has Snowballs that are designed to be more compute-heavy and wouldn’t require the sideloading process we’re about to discuss. We simply had an “extra” device and wanted to see just how much we could push it outside of its designed comfort window.
Sideloading Custom Virtual Machines to Storage Optimized Snowball
We highly suggest that you read through the official Amazon Blog on this process; our steps here are based on our specific configuration and how we were able to execute it.
When AWS Snowball Edge was first introduced in 2016, users who wanted to run Amazon Elastic Compute Cloud (Amazon EC2) instances on the device had to specify an Amazon Machine Image (AMI) during the ordering process. The device would then support launching Amazon EC2 instances based on the selected AMI. However, updating an AMI or switching to a different one for new workloads, issue resolution, or enabling new features required returning the device to AWS for the AMI update and then waiting for it to be shipped back.
This process has since been streamlined. Some of the steps here are for reference only and can be used directly from the Amazon piece, so we won’t be specifying the details but providing more of a checklist.
- Create a VM on your workstation that you want to be loaded to Snowball.
- Install your hypervisor. We elected to use Oracle VirtualBox as specified by Amazon. However, we used a Windows-based host, which has some minor differences in the process.
- Install your guest OS. We chose Ubuntu 22.04 because it was easy to get and work with. Once installed, we suggest doing updates and making sure DHCP is enabled and start testing SSH/RDP access now.
- Keep in mind when selecting a disk size, in a later step, it will be converted to a RAW disk file, so however large or small a disk you have to select, you will have to load all of the space to the Snow device.
- Locate the virtual disk .vdi file on your hard disk, and copy the location with the file name.
- Navigate to the installation folder of VirtualBox; for us, it was “C:\Program Files\Oracle\VirtualBox” Right-click, “Open Powershell Window Here” (Windows-specific other commands available in the Amazon article)
- Use the path of your .vdi file you created earlier and this command as a reference to make your own. (Windows version here)
.\VBoxManage.exe clonehd "C:\Users\Jordan\VirtualBox VMs\SnowballUbuntu\SnowballUbuntu.vdi" "C:\Users\Jordan\VirtualBox VMs\SnowballUbuntu\SnowballUbuntu.raw" --format raw
- Load the .raw image to the Snow Device.
- Create IAM permissions for image import by setting up an IAM role and associated policy for the VM Import/Export process.
- Create an IAM policy granting the necessary permissions for the local VM Import/Export service to download the snapshot from Amazon S3 on the device.
- After creating the policy, create an IAM role with a trust policy, allowing Snowball VM Import/Export to assume the role.
- Attach the policy created earlier to the IAM role, enabling VM Import/Export to access the image stored in the S3 bucket on the device.
- Import image as snapshot
- Navigate back to the Snowball dashboard page and select “Get started” on the “Start computing” panel.
- Choose “Snapshots” and then “Import snapshot” to begin importing the raw image as a snapshot.
- On the “Import snapshot” page, provide the required descriptions and specify the IAM role created earlier.
- Browse S3 to locate and select the raw image file, then submit the import request.
- The snapshot import will take a few minutes to complete, depending on the image size.
- Upon completion, the state will display “Completed.
- Register an AMI from the snapshot
- To register an AMI from the snapshot, select the snapshot ID you just created and click “Register image.”
- Enter a name and description for the AMI, keeping the root volume device as /dev/sda1, and submit.
- The snapshot will now be registered as an AMI, allowing you to launch EC2 instances from it.
- Launch your EC2 Instance on the Snow device
- To launch an EC2 instance from your AMI, navigate back to the Snowball dashboard page and select “Instances.”
- Click “Launch instance” and enter your AMI name and the desired instance type.
- For public IP address assignment, choose to create a new one (VNI), use an existing one, or not assign one at all.
- Regarding the key pair, opt not to attach a key pair if you’ve already added required public keys to the image or choose to create/use an existing key pair.
- Click “Launch” to initialize your EC2 instance.
- Once the EC2 instance is up and running, access it in the same way as any other EC2 instance in AWS.
While the process of sideloading custom virtual machines into devices like AWS Snowball Edge may seem complex and challenging, the effort is well worth it due to the numerous benefits it offers. It’s important to note that while it is possible to side-load an AMI after ordering the device, opting for the device with the AMI already loaded will provide you with a pre-configured appliance that is ready to use.
Utilizing custom virtual machines for edge data collection provides several significant advantages. Customization allows organizations to tailor their virtual machines to specific use cases, optimizing performance and efficiency. By integrating specialized applications, organizations can streamline data processing and analysis directly at the edge, reducing latency and enhancing real-time decision-making.
The increased flexibility and adaptability that these sideloaded custom virtual machines offer can enable organizations to quickly respond to evolving needs or unexpected changes in their data collection requirements. By sideloading custom virtual machines into edge devices like AWS Snowball Storage Optimized Edge, organizations can leverage the full potential of edge computing and efficiently manage their data collection and processing needs in diverse environments.
Implementing Rapid Data Collection at the Edge
Setting up a Storage Optimized Snowball for data collection involves configuring the device to handle specific data collection tasks and requirements. By leveraging the robust capabilities of the Snowball Edge device, organizations can collect and process large volumes of data in environments with intermittent connectivity or remote locations.
The device’s block storage and Amazon S3-compatible object storage enable users to securely store, manage, and transfer massive amounts of data efficiently. By customizing the Snowball Edge according to project requirements, organizations can optimize data collection processes to meet their unique needs and goals.
Integration of custom virtual machines with data collection tools further streamlines the data collection process at the edge. By incorporating specialized applications or frameworks, organizations can process and analyze data directly on the Snowball Edge device, reducing latency and enhancing real-time decision-making.
This integration allows for seamless collaboration between various data collection tools and custom virtual machines, ensuring efficient data processing and management. Furthermore, optimizing data transfer and synchronization with Amazon S3 enables organizations to benefit from the scalable and secure storage provided by Amazon’s cloud infrastructure.
This process facilitates the seamless transfer of collected data from the Snowball Edge device to Amazon S3, ensuring that data is readily available for further analysis or long-term storage. In turn, this fosters a reliable and efficient data management ecosystem that supports rapid data collection and processing at the edge.
Sneaker-net Advantage
In many scenarios, Sneaker-net, or physically transferring data using devices like the Storage Optimized Snowball, can be faster than transferring data over the internet. This is especially true for remote or temporary installations with limited bandwidth, high latency, or unreliable connectivity.
Examples include research stations in remote locations, temporary event venues, or even disaster recovery sites. By using AWS Snowball to transport large volumes of data, organizations can bypass the constraints of slow or unreliable internet connections and ensure that data is transferred quickly and securely to Amazon S3 for further processing and analysis.
Data stored in S3 benefits from the inherent scalability and flexibility offered by the AWS ecosystem. As data volumes grow, organizations can easily adjust their storage capacity to accommodate changing requirements without the need for costly infrastructure investments.
Additionally, S3 integrates seamlessly with a wide range of AWS services, such as Amazon Athena, Amazon Redshift, and Amazon SageMaker, enabling organizations to analyze, process, and derive insights from their data using powerful analytics and machine learning tools. This integration ultimately empowers organizations to make data-driven decisions and unlock new opportunities for growth and innovation.
Closing Thoughts
Amazon AWS Storage Optimized Snowball, when combined with custom virtual machines, offers a powerful and cost-effective solution for rapid data collection at the edge. Temporary installations can now efficiently gather and store large volumes of data while benefiting from the security, scalability, and ease of integration offered by S3. By embracing this innovative approach, organizations can significantly reduce hardware costs, streamline their data management, and unlock new insights from their data.
While our approach to this process was a little backward, ideally you’d configure the EC2 instances at the time of order to make life easy, it is nice to know that AWS allows for “creative flexibility” with their Snowball appliances. Really though, if the workload is compute-intensive, AWS offers the Snowball Edge Compute Optimized with up to 104 vCPUs, 416GB of DRAM, and 28TB of flash. And if you have an analytics need, they even offer Snowballs with GPUs. For edge data collection, AWS offers a ton of options and part of the fun is discovering which Snow device might be right for you.
AWS Snow Podcast with StorageReview and Wayne Duso
Engage with StorageReview
Newsletter | YouTube | Podcast iTunes/Spotify | Instagram | Twitter | TikTok | RSS Feed