...
Jakub Czapliński
14/5/2020
...

How To Connect Jetson Nano To Kubernetes Using K3s And K3sup

In this article, I will show how to connect a Jetson Nano Developer board to the Kubernetes cluster to act as a GPU node. I will cover the setup of NVIDIA docker needed to run containers using GPU and connecting Jetson to the Kubernetes cluster. After successfully connecting the node to the cluster I will also show how to run a simple TensorFlow 2 training session using the GPU on the Jetson Nano. If you are interested in setting up a K3s cluster, you can follow my other tutorial explaining how to build a K3s cluster on Raspberry Pi using Ubuntu Server 18.04. Most of the information provided here is not unique to Raspberry Pi.

K3s or Kubernetes?

K3s is a lightweight version of Kubernetes that is optimized for smaller installations which, in my opinion, is ideal for single-board computers as it takes significantly fewer resources. You can read more about it here. On the other hand, K3sup is a great open-source tool built by Alex Ellis for simplifying the installation of K3s clusters. You can find more information about it on the https://github.com/alexellis/k3sup

What do we need?

Plan

  • Setup NVIDIA docker
  • Add Jetson Nano to the K3s cluster
  • Run a simple MNIST example to showcase the usage of GPU inside the Kubernetes pod

Setting up NVIDIA docker

Before we configure Docker to use nvidia-docker as a default runtime, I would like to spend a moment explaining why this is needed. By default, when users run containers on Jetson Nano they will run in the same way as on any other hardware and you can’t access the GPU from the container, or at least not without some hacking. If you want to test it out by yourself you can run the following command and should see similar results

root@jetson:~# echo "python3 -c 'import tensorflow'" | docker run -i icetekio/jetson-nano-tensorflow /bin/bash
2020-05-14 00:10:23.370761: W tensorflow/stream_executor/platform/default/dso_loader.cc:59] Could not load dynamic library 'libcudart.so.10.2'; dlerror: libcudart.so.10.2: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /usr/local/cuda-10.2/targets/aarch64-linux/lib:
2020-05-14 00:10:23.370859: I tensorflow/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine.
2020-05-14 00:10:25.946896: W tensorflow/stream_executor/platform/default/dso_loader.cc:59] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /usr/local/cuda-10.2/targets/aarch64-linux/lib:
2020-05-14 00:10:25.947219: W tensorflow/stream_executor/platform/default/dso_loader.cc:59] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /usr/local/cuda-10.2/targets/aarch64-linux/lib:
2020-05-14 00:10:25.947273: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:30] Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly.
/usr/lib/python3/dist-packages/h5py/__init__.py:36: FutureWarning: Conversion of the second argument of issubdtype from `float` to `np.floating` is deprecated. In future, it will be treated as `np.float64 == np.dtype(float).type`.
  from ._conv import register_converters as _register_converters

If you now try to run the same command but add --runtime=nvidia parameter to the docker command you should see something like this

root@jetson:~# echo "python3 -c 'import tensorflow'" | docker run --runtime=nvidia -i icetekio/jetson-nano-tensorflow /bin/bash
2020-05-14 00:12:16.767624: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcudart.so.10.2
2020-05-14 00:12:19.386354: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libnvinfer.so.7
2020-05-14 00:12:19.388700: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libnvinfer_plugin.so.7
/usr/lib/python3/dist-packages/h5py/__init__.py:36: FutureWarning: Conversion of the second argument of issubdtype from `float` to `np.floating` is deprecated. In future, it will be treated as `np.float64 == np.dtype(float).type`.
  from ._conv import register_converters as _register_converters

The nvidia-docker is configured, however not enabled by default. To enable docker to run nvidia-docker runtime as a default — add the "default-runtime": "nvidia" to the /etc/docker/daemon.json config file so it will look like this

{
    "runtimes": {
        "nvidia": {
            "path": "nvidia-container-runtime",
            "runtimeArgs": []
        }
    },
    "default-runtime": "nvidia"
}

Now you can skip the --runtime=nvidia argument in the docker run command, and the GPU will be initialized by default. This is needed so that K3s will use Docker with the nvidia-docker runtime allowing the pods to use GPU without any hassle and special configuration.

Connecting Jetson as a Kubernetes node

Connecting Jetson as a Kubernetes node using K3sup is only 1 command, however for it to work we need to be able to connect to both Jetson and the master node without a password and do sudo without a password, or to connect as a root user.

If you need to generate SSH keys and copy them over you can run something like this

ssh-keygen -t rsa -b 4096 -f ~/.ssh/rpi -P ""
ssh-copy-id -i .ssh/rpi user@host

By default, Ubuntu installations require users to put in a password for sudo command. Because of that, the easier way is to use K3sup with a root account. To make this work copy your ~/.ssh/authorized_keys to /root/.ssh/ directory.

Before connecting Jetson, let's look at the cluster we want to connect it to

upgrade@ZeroOne:~$ kubectl get node -o wide
NAME      STATUS   ROLES    AGE   VERSION        INTERNAL-IP    EXTERNAL-IP   OS-IMAGE             KERNEL-VERSION      CONTAINER-RUNTIME
nexus     Ready    master   32d   v1.17.2+k3s1   192.168.0.12   <none>        Ubuntu 18.04.4 LTS   4.15.0-96-generic   containerd://1.3.3-k3s1
rpi3-32   Ready    <none>   32d   v1.17.2+k3s1   192.168.0.30   <none>        Ubuntu 18.04.4 LTS   5.3.0-1022-raspi2   containerd://1.3.3-k3s1
rpi3-64   Ready    <none>   32d   v1.17.2+k3s1   192.168.0.32   <none>        Ubuntu 18.04.4 LTS   5.3.0-1022-raspi2   containerd://1.3.3-k3s1

As you may notice, the master node is the nexus host on IP 192.168.0.12 that is running containerd. By default, k3s is running containerd but that can be modified. The containerd is a bit problematic as we set up the nvidia-docker to run with Docker and it is needed for the GPU. Fortunately, to switch from containerd to Docker we just need to pass one additional parameter to the k3sup command. So, finally, to connect our Jetson to the cluster we can run:

k3sup join --ssh-key ~/.ssh/rpi  --server-ip 192.168.0.12  --ip 192.168.0.40   --k3s-extra-args '--docker'

The IP 192.168.0.40 is my Jetson Nano. As you can see we passed the --k3s-extra-args '--docker' flag that passes the --docker flag to the k3s agent while installing it. Thanks to that, we are using the docker with nvidia-docker setup rather than containerd.

To check if the node connected correctly we can run kubectl get node -o wide

upgrade@ZeroOne:~$ kubectl get node -o wide
NAME      STATUS   ROLES    AGE   VERSION        INTERNAL-IP    EXTERNAL-IP   OS-IMAGE             KERNEL-VERSION      CONTAINER-RUNTIME
nexus     Ready    master   32d   v1.17.2+k3s1   192.168.0.12   <none>        Ubuntu 18.04.4 LTS   4.15.0-96-generic   containerd://1.3.3-k3s1
rpi3-32   Ready    <none>   32d   v1.17.2+k3s1   192.168.0.30   <none>        Ubuntu 18.04.4 LTS   5.3.0-1022-raspi2   containerd://1.3.3-k3s1
rpi3-64   Ready    <none>   32d   v1.17.2+k3s1   192.168.0.32   <none>        Ubuntu 18.04.4 LTS   5.3.0-1022-raspi2   containerd://1.3.3-k3s1
jetson    Ready    <none>   11s   v1.17.2+k3s1   192.168.0.40   <none>        Ubuntu 18.04.4 LTS   4.9.140-tegra       docker://19.3.6

Simple validation

We can now run the pod using the same docker image and command to check if we will have the same results as running docker on Jetson Nano at the beginning of this article.

To do this, we can apply this pod spec:

apiVersion: v1
kind: Pod
metadata:
  name: gpu-test
spec:
  nodeSelector:
    kubernetes.io/hostname: jetson
  containers:
  - image: icetekio/jetson-nano-tensorflow
    name: gpu-test
    command:
    - "/bin/bash"
    - "-c"
    - "echo 'import tensorflow' | python3"
  restartPolicy: Never

Wait for the docker image to pull and then view the logs by running:

upgrade@ZeroOne:~$ kubectl logs gpu-test
2020-05-14 10:01:51.341661: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcudart.so.10.2
2020-05-14 10:01:53.996300: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libnvinfer.so.7
2020-05-14 10:01:53.998563: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libnvinfer_plugin.so.7
/usr/lib/python3/dist-packages/h5py/__init__.py:36: FutureWarning: Conversion of the second argument of issubdtype from `float` to `np.floating` is deprecated. In future, it will be treated as `np.float64 == np.dtype(float).type`.
  from ._conv import register_converters as _register_converters

As you can see, we have similar log messages as previously running docker on the Jetson!

Running MNIST training

We have a running node with GPU support, so now we can test out the “Hello world” of Machine Learning and run the TensorFlow 2 model example using MNIST dataset.

To run a simple training session that will prove the usage of GPU apply the manifest below.

apiVersion: v1
kind: Pod
metadata:
  name: mnist-training
spec:
  nodeSelector:
    kubernetes.io/hostname: jetson
  initContainers:
    - name: git-clone
      image: iceci/utils
      command:
        - "git"
        - "clone"
        - "https://github.com/IceCI/example-mnist-training.git"
        - "/workspace"
      volumeMounts:
        - mountPath: /workspace
          name: workspace
  containers:
    - image: icetekio/jetson-nano-tensorflow
      name: mnist
      command:
        - "python3"
        - "/workspace/mnist.py"
      volumeMounts:
        - mountPath: /workspace
          name: workspace
  restartPolicy: Never
  volumes:
    - name: workspace
      emptyDir: {}

As you can see in the log below, the GPU is running.

...
2020-05-14 11:30:02.846289: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1697] Adding visible gpu devices: 0
2020-05-14 11:30:02.846434: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcudart.so.10.2
....

If you are on the node, you can test the usage of CPU and GPU by running tegrastats command

upgrade@jetson:~$ tegrastats --interval 5000
RAM 2462/3964MB (lfb 2x4MB) SWAP 362/1982MB (cached 6MB) CPU [52%@1479,41%@1479,43%@1479,34%@1479] EMC_FREQ 0% GR3D_FREQ 9% PLL@23.5C CPU@26C PMIC@100C GPU@24C AO@28.5C thermal@25C POM_5V_IN 3410/3410 POM_5V_GPU 451/451 POM_5V_CPU 1355/1355
RAM 2462/3964MB (lfb 2x4MB) SWAP 362/1982MB (cached 6MB) CPU [53%@1479,42%@1479,45%@1479,35%@1479] EMC_FREQ 0% GR3D_FREQ 9% PLL@23.5C CPU@26C PMIC@100C GPU@24C AO@28.5C thermal@24.75C POM_5V_IN 3410/3410 POM_5V_GPU 451/451 POM_5V_CPU 1353/1354
RAM 2461/3964MB (lfb 2x4MB) SWAP 362/1982MB (cached 6MB) CPU [52%@1479,38%@1479,43%@1479,33%@1479] EMC_FREQ 0% GR3D_FREQ 10% PLL@24C CPU@26C PMIC@100C GPU@24C AO@29C thermal@25.25C POM_5V_IN 3410/3410 POM_5V_GPU 493/465 POM_5V_CPU 1314/1340

Summary

As you can see, hooking up a Jetson Nano to a Kubernetes cluster is a pretty simple and straightforward process. In just a couple of minutes, you’ll be able to leverage Kubernetes to run machine learning workloads as well as use the power of NVIDIA’s pocket-sized GPU. You’ll be able to run any GPU containers designed for Jetson Nano on Kubernetes, which can simplify your development and testing.

Readout