NVIDIA HGX H200 GPU Rental
Flagship compute for AI training and inference
More memory, higher bandwidth, and stronger inference performance
H200 8-GPU cluster
Equipped with 141 GB HBM3e memory and 4.8 TB/s bandwidth
Enhanced Tensor Core architecture and faster memory bandwidth accelerate large-scale AI deployments
Transformer model inference speed increases by 2x and energy efficiency improves by 35%. Using the full DeepSeek model as an example, a single 8-GPU H200 server is expected to deliver about 30% higher inference throughput than 16 H100 GPUs.
Bare-metal or cloud server delivery
Data centers: North America and Europe

Why choose H200?
Next-generation inference performance for enterprise AI applications
HBM3e memory
76% more than H100
Transformer inference speed
A significant improvement over H100
Energy-efficiency improvement
Better price-performance
Performance advantages
- 4.8 TB/s memory bandwidth for very large context windows
- One 8-GPU server delivers about 30% higher inference throughput than 16 H100 GPUs
- Suitable for mainstream models including DeepSeek, LLaMA3, and Mistral
- 2x faster Transformer model inference
Deployment flexibility
- Bare-metal servers with no virtualization overhead
- Cloud server instances with on-demand elastic scaling
- North American and European data centers
- Long-term compute reservations
Use cases
- Hundreds-of-billions-parameter model inference
- Large-context LLM services
- High-throughput AI API platforms
- Large-scale inference optimization
H200 GPU Server Specifications
Next-generation HBM3e memory with greater capacity and bandwidth
| Specification | H200 8-GPU bare metal | H200 8-GPU cloud server | H200 single-GPU cloud server |
|---|---|---|---|
| GPU | NVIDIA HGX H200 141GB 700W SXM GPUs × 8 fully interconnected with NVIDIA NVLink technology | NVIDIA H200 SXM GPUs × 8 141GB × 8 = 1128GB HBM3e Memory | NVIDIA H200 SXM GPU 141GB HBM3e Memory |
| GPU memory | 141GB HBM3e per GPU 1128GB total | 141GB HBM3e per GPU 1128GB total | 141GB HBM3e |
| Memory bandwidth | 4.8TB/s per GPU | 4.8TB/s per GPU | 4.8TB/s |
| CPU | 96 cores 192 Threads Intel(R) Xeon(R) Platinum 8468 × 2 4th Gen Intel® Xeon® Scalable Processors | 192 VCPU | 24 VCPU |
| Memory | 2048GB(64GB × 32)DDR5 | 1920GB | 240GB |
| Local storage | 7TB 2.5-inch NVMe SSD drives × 8 | 2TB boot disk + 40TB NVMe SSD local disk | 720GB boot disk + 5TB NVMe SSD local disk |
| GPU interconnect | NVLink Switch System 900GB/s per GPU RoCE2 RDMA network support | NVLINK supported RoCE2 3.6Tbs RDMA network | - |
| Ethernet | Mellanox Technologies MT2892 Family [ConnectX-6 Dx] link speed 100Gbps × 4 | - | - |
| Private network | Up to 400Gbps | 25Gbps | 25Gbps |
| Public network | Up to 40Gbps | 10Gbps | 10Gbps |
| Included outbound transfer | Unlimited transfer | 60TB | 15TB |
| Billing model | Annual or monthly | Annual, monthly, or on-demand | Annual, monthly, or on-demand |
* Specifications are subject to the delivered configuration.
NVIDIA HGX H200 Technical specifications
| H200 SXM¹ | Specification |
|---|---|
| FP64 | 34 TFLOPS |
| FP64 Tensor Core | 67 TFLOPS |
| FP32 | 67 TFLOPS |
| TF32 Tensor Core² | 989 TFLOPS |
| BFLOAT16 Tensor Core² | 1,979 TFLOPS |
| FP16 Tensor Core² | 1,979 TFLOPS |
| FP8 Tensor Core² | 3,958 TFLOPS |
| INT8 Tensor Core² | 3,958 TFLOPS |
| GPU Memory | 141GB |
| GPU Memory Bandwidth | 4.8TB/s |
| Decoders | 7 NVDEC 7 JPEG |
| Confidential Computing | Supported |
| Max Thermal Design Power (TDP) | Up to 700W (configurable) |
| Multi-Instance GPUs | Up to 7 MIGs @18GB each |
| Form Factor | SXM |
| Interconnect | NVIDIA NVLink™: 900GB/s PCIe Gen5: 128GB/s |
| Server Options | NVIDIA HGX™ H200 partner and NVIDIA-Certified Systems™ with 4 or 8 GPUs |
| NVIDIA AI Enterprise | Add-on |
H200 GPU Rental Guide
H200 Rental, Server Configurations, and Use Cases
This guide addresses H200 rental, H200 servers, and H200 cloud GPU procurement, covering pricing factors, configuration priorities, and selection differences versus H100 and B300.
Pricing and configuration considerations
H200 rental pricing is affected by GPU count, memory requirements, bare-metal or cloud delivery, rental term, region, bandwidth, and storage. Evaluate cost per unit of throughput, launch time, and long-term resource stability rather than only the per-GPU price.
Use cases
- Large-model inference, fine-tuning, RAG, and long-context applications.
- Teams that exceed H100 memory or throughput but do not yet need to move to B300.
- AI products seeking a balance among cost, ecosystem maturity, and performance.
- Production traffic that requires stable overseas GPU capacity.
GPU and cloud server comparison
H200
High memory and throughput
Fits memory-sensitive inference and fine-tuning with longer contexts and higher concurrency.
H100
Mature and stable
Fits conventional AI workloads that are more budget-sensitive and rely on mature frameworks and tutorials.
B300
Flagship upgrade
Fits higher throughput, longer-term compute planning, and next-generation cluster deployments.
Frequently asked questions
When should H200 rental replace H100?
H200 is a natural upgrade from H100 when memory, context length, or throughput limits the workload. H100 may still be more economical for smaller workloads.
Is an H200 server better for training or inference?
Both are supported, but H200’s larger memory is especially valuable for large-model inference, long contexts, and fine-tuning.
How should I choose between H200 and B300?
H200 is a mature, stable high-memory option, while B300 fits flagship compute planning for higher performance and a longer lifecycle.
Related reading and procurement guides
H200 vs H100 Comparison
Compare H200 and H100 by memory, throughput, cost, and workload fit.
Choosing H200, H100, MI300X, or MI325X
Compare the positioning and procurement considerations of high-end GPUs.
How to Choose B300 or H200 for DeepSeek
Assess H200 and B300 costs and benefits for DeepSeek workloads.
Need to secure capacity?
H200 availability is limited. We support:
