OpenAI has launched gpt-oss-120b and gpt-oss-20b—powerful open-weight language models optimized for reasoning, tool use, and efficient deployment on consumer hardware.
These models are released under the Apache 2.0 license, one of the most flexible open-source licenses available, so you can integrate and scale your projects freely.
Now you can access OpenAI GPT-OSS 120B & 20B instantly on HPC-AI.COM!
Here’s how to get started in just a few steps:
-
Environment Setup
We provide preconfigured, high-performance machine learning environments, so you can hit the ground running. Just choose an image—such as CUDA 12.8—and your GPU instance will be ready in minutes.
Next, install the inference framework of your choice. For example, follow the
vLLM setup guide to get started quickly. For detailed documentation, please refer to
GPT-OSS vLLM Usage Guide.
pip install uv
uv venv --python 3.12 --seed
source .venv/bin/activate
uv pip install --pre vllm==0.10.1+gptoss \
--extra-index-url https://wheels.vllm.ai/gpt-oss/ \
--extra-index-url https://download.pytorch.org/whl/nightly/cu128 \
--index-strategy unsafe-best-match
-
Access GPT-OSS in 5 Minutes
We provide cluster-level caching that enables incredibly fast model loading: the entire 183GB model file can be downloaded within 5 minutes at speeds up to 1.1GB/s. With dedicated data disks and high-speed shared storage configured for you, running GPT-OSS in a private environment is effortless.
Here is how you can use GPT-OSS-120B models on
HPC-AI.COM:
cd ${YourModelPath}
#!/bin/bash
export model="openai/gpt-oss-120b"
curl -sSL https://d.juicefs.com/install | sh -
juicefs sync minio://minio:minio123@minio:9000/hf-model/${model}/ ./${model}/
-
Deploy Your Inference Service
We provide a public network forwarding service that allows you to expose your inference service to the internet. You can set up an HTTP port forwarding service through the launch or configuration interface, enabling your service to be accessible in a public environment.
For more details, including a comprehensive guide and interesting examples of GPT-OSS, please refer to the docs: https://hpc-ai.com/doc/docs/tutorial/gpt-oss
Additionally, we have increased the supply of H200 GPUs in our US cluster and are now offering them at the lowest price of just $2.19/hour — so you can start using GPT-OSS right away!
Subscribe to stay updated — we’ll soon release performance benchmarks for vLLM and SGLang on GPT-OSS to help you better evaluate and compare.
Reference:
https://openai.com/index/introducing-gpt-oss/