|
|
2 dias atrás | |
|---|---|---|
| .. | ||
| files | 2 dias atrás | |
| tasks | 2 dias atrás | |
| templates | 2 dias atrás | |
| vars | 2 dias atrás | |
| README.md | 2 dias atrás | |
这个 Ansible role 用于在主机上安装和配置 NVIDIA 驱动、CUDA 环境和容器运行时,用于 AI 训练和推理环境。
将 NVIDIA 驱动和 CUDA 安装包放在本地目录中:
# 创建本地目录并放置安装包
mkdir -p /root/nvidia
cp NVIDIA-Linux-x86_64-570.133.07.run /root/nvidia/
cp cuda_12.8.1_570.124.06_linux.run /root/nvidia/
使用 ocboot.py 命令,通过命令行参数传入配置:
ocboot.py setup-ai-env <target_host1> <target_host2> ... \
--nvidia-driver-installer-path <full_path_to_driver> \
--cuda-installer-path <full_path_to_cuda> \
[--gpu-device-virtual-number 2]
--nvidia-driver-installer-path (必需): NVIDIA 驱动安装包的完整路径,例如 /root/nvidia/NVIDIA-Linux-x86_64-570.133.07.run--cuda-installer-path (必需): CUDA 安装包的完整路径,例如 /root/nvidia/cuda_12.8.1_570.124.06_linux.run--gpu-device-virtual-number (可选): NVIDIA GPU 共享设备的虚拟编号,默认为 2--user, -u (可选): SSH 用户名,默认为 root--key-file, -k (可选): SSH 私钥文件路径--port, -p (可选): SSH 端口,默认为 22# 基本用法
ocboot.py setup-ai-env 10.127.222.247 \
--nvidia-driver-installer-path /root/nvidia/NVIDIA-Linux-x86_64-570.133.07.run \
--cuda-installer-path /root/nvidia/cuda_12.8.1_570.124.06_linux.run
# 指定自定义路径
ocboot.py setup-ai-env 10.127.222.247 \
--nvidia-driver-installer-path /opt/nvidia/NVIDIA-Linux-x86_64-570.172.08.run \
--cuda-installer-path /opt/nvidia/cuda_12.8.1_570.172.08_linux.run \
--gpu-device-virtual-number 2
# 指定 SSH 用户和端口
ocboot.py setup-ai-env 10.127.222.247 \
--nvidia-driver-installer-path /root/nvidia/NVIDIA-Linux-x86_64-570.133.07.run \
--cuda-installer-path /root/nvidia/cuda_12.8.1_570.124.06_linux.run \
--user admin \
--port 2222
如果需要直接使用 ansible-playbook,可以通过 -e 参数传入变量:
ansible-playbook -i inventory setup-ai-env-services.yml \
-e nvidia_driver_installer_path=/root/nvidia/NVIDIA-Linux-x86_64-570.133.07.run \
-e cuda_installer_path=/root/nvidia/cuda_12.8.1_570.124.06_linux.run \
-e gpu_device_virtual_number=2
# 传输 NVIDIA 驱动
rsync -avP /path/to/nvidia/NVIDIA-Linux-x86_64-570.133.07.run target_host:/root/nvidia/
# 传输 CUDA 安装包
rsync -avP /path/to/cuda/cuda_12.8.1_570.124.06_linux.run target_host:/root/nvidia/
# 传输 NVIDIA 驱动
scp /path/to/nvidia/NVIDIA-Linux-x86_64-570.133.07.run target_host:/root/nvidia/
# 传输 CUDA 安装包
scp /path/to/cuda/cuda_12.8.1_570.124.06_linux.run target_host:/root/nvidia/
utils/containerd 在本 role 之前完成)