Amazon Web Services (AWS) has announced the general availability of AWS EC2 Trn1 instances. Powered by AWS-designed Trainium chips, Trn1 instances are specifically designed for the high-performance training of machine learning models (in the cloud) with Amazon indicating a reduction of 50% in “cost-to-train” savings when compared to similar GPU-based instances.
Amazon Web Services (AWS) has announced the general availability of AWS EC2 Trn1 instances. Powered by AWS-designed Trainium chips, Trn1 instances are specifically designed for the high-performance training of machine learning models (in the cloud) with Amazon indicating a reduction of 50% in “cost-to-train” savings when compared to similar GPU-based instances.
AWS EC2 Trn1 instances provide the fastest time to train popular machine learning models on AWS. This allows their customers to lessen training times, quickly iterate on models to increase accuracy, and improve overall productivity for workloads such as natural language processing, speech and image recognition, semantic search, recommendation engines, fraud detection, and forecasting.
Trn1 instances are very flexible as far as pricing goes as well, as there are no minimum commitments or upfront fees. Customers also only need to pay for the amount of compute they use.
Sizes and Specifications of AWS EC2 Trn1 Instances
Instance Name | vCPUs | AWS Trainium Chips | Accelerator Memory | NeuronLink | Instance Memory | Instance Networking | Local Instance Storage |
trn1.2xlarge | 8 | 1 | 32 GB | N/A | 32 GB | Up to 12.5 Gbps | 1x 500 GB NVMe |
trn1.32xlarge | 128 | 16 | 512 GB | Supported | 512 GB | 800 Gbps | 4x 2 TB NVMe |
Previously, even if organizations leveraged the fastest accelerated instances available, training more complex machine learning models was still both excessively expensive and time-consuming. With the new AWS EC2 Trn1 instances, Amazon indicates they boast the best price performance and the fastest machine learning model training on AWS.
Other notable features include the following:
- Those looking to get started without significantly changing code can use AWS Neuron, the software development kit (SDK) for Trn1 instances. It is also integrated into popular frameworks for machine learning like PyTorch and TensorFlow.
- Trn1 instances feature up to 16 AWS Trainium accelerators that are specifically designed for deploying deep learning models.
- To improve efficiency, Trn1 is the first Amazon EC2 instance to offer up to 800Gbps in networking bandwidth via the 2nd-gen AWS Elastic Fabric Adapter (EFA) network interface.
- To speed up training, Trn1 instances also use NeuronLink–a high-speed, intra-instance interconnect.
Amazon EC2 UltraClusters
Customers can deploy Trn1 instances in Amazon EC2 UltraClusters (comprised of tens of thousands of Trainium accelerators) to quickly train the most complex deep learning models, even those with trillions of parameters. With EC2 UltraClusters, organizations have the ability to scale the training of machine learning models with up to 30,000 Trainium accelerators interconnected with EFA petabit-scale networking. Amazon indicates that these organizations will therefore have on-demand access to supercomputing-class performance, which can significantly cut training time that usually takes months to just days.
Each AWS EC2 Trn1 instance supports up to 8TB of speedy local NVMe SSD storage, while AWS Trainium supports a wide range of data types (FP32, TF32, BF16, FP16, and configurable FP8). It also supports stochastic rounding, a method based on probability, to enable high performance and higher accuracy. In addition, AWS Trainium supports dynamic tensor shapes and custom operators, which promotes a flexible infrastructure designed to adapt based on customer training needs.
AWS Nitro System
Trn1 instances are built on the AWS Nitro System, a collection of AWS-designed hardware and software innovations that streamline the delivery of isolated multi-tenancy, private networking, and fast local storage. In order to deliver the necessary performance, the Nitro System offloads the CPU virtualization, storage, and networking functions to dedicated hardware and software.
AWS EC2 Trn1 Instances Availability
AWS Trn1 instances can be purchased now as On-Demand Instances (with Savings Plans), Reserved Instances, or Spot Instances. Currently, they are available in US East (North Virginia) and US West (Oregon), with expanded availability in other AWS Regions soon.
They will also be available through the following other AWS services:
- Amazon SageMaker
- Amazon Elastic Kubernetes Service (Amazon EKS)
- Amazon Elastic Container Service (Amazon ECS)
- AWS Batch
Engage with StorageReview
Newsletter | YouTube | Podcast iTunes/Spotify | Instagram | Twitter | TikTok | RSS Feed