Ubuntu使用Docker安装TensorFlow1.7.0和Facenet开启GPU运行环境下载
目前只有Linux允许Docker调用GPU,自然要使用Linux啦
为什么想要使用这个方案
-
TensorFlow对依赖包要求很高,但一个conda环境只能安装一个版本的,如果使用多个conda环境将难以在一套系统内运行(可能需要编写我不熟悉的shell脚本)
-
后续可能使用其他的包,需要安装其他包的环境,可能会和TensorFlow的环境冲突
-
TensorFlow1.7.0是非常老的版本,配套的软件也都已经过时,如果还要强行安装,可能出现兼容性问题
而如果使用Docker技术,则可以避开这几个问题
-
Docker可以安装多个,也就是可以多环境共存
-
每个Container环境独立,不存在环境冲突
-
不存在兼容性问题,因为都是TensorFlow官方配置好的Docker镜像
外部安装
简明步骤:
1. 安装docker
2. pull
docker pull tensorflow/tensorflow:1.7.0-gpu-py3
3. 运行container
docker container run -it --runtime=nvidia \ -v /home/vision/undergraduate/:/mnt \ -p 10022:22 \ -p 18888:8888 \ --dns 8.8.8.8 \ --privileged \ tensorflow/tensorflow:1.7.0-gpu-py3 bash
关于docker的一些操作:https://tf.wiki/zh_hans/appendix/docker.html
进入Docker执行命令:
docker exec -it f62279378b31a21c77240851a62db5d5d4c7125cb41faa43e7dac0f576e7192a bash
让VSCode可以管理Docker
需要让Docker可以不用sudo管理:https://docs.docker.com/engine/install/linux-postinstall/
sudo groupadd docker sudo usermod -aG docker $USER newgrp docker sudo reboot 重启一下就可以了 docker run hello-world
运行代码
Docker里面是已经配置好的环境,可以直接运行
nvidia-smi
查看GPU状态
这里我使用挂载方式,可以直接访问主机上的文件夹,从而可以实现文件互通
运行Facenet代码之前,需要先对Docker做一点小操作
apt update --如果卡住,试试全局代理 apt install vim --默认是没有vim的:( pip install opencv-python
之后尝试运行代码,报错:ImportError: libGL.so.1: cannot open shared object file: No such file or directory
解决办法:ImportError: libGL.so.1: cannot open shared object file: No such file or directory
apt install libgl1-mesa-glx
然后就可以正常运行:
root@f62279378b31:/mnt/asep/2-vipiden/facenet/src# python compare.py 20180408-102900 /mnt/asep/photo-op/facecompare/face1.png /mnt/asep/photo-op/facecompare/face2.png /usr/local/lib/python3.5/dist-packages/h5py/__init__.py:36: FutureWarning: Conversion of the second argument of issubdtype from `float` to `np.floating` is deprecated. In future, it will be treated as `np.float64 == np.dtype(float).type`. from ._conv import register_converters as _register_converters Creating networks and loading parameters 2021-01-23 12:22:37.069440: I tensorflow/core/platform/cpu_feature_guard.cc:140] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 AVX512F FMA 2021-01-23 12:22:37.446821: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:898] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero 2021-01-23 12:22:37.447106: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1344] Found device 0 with properties: name: GeForce RTX 2070 major: 7 minor: 5 memoryClockRate(GHz): 1.725 pciBusID: 0000:03:00.0 totalMemory: 7.79GiB freeMemory: 7.70GiB 2021-01-23 12:22:37.447125: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1423] Adding visible gpu devices: 0 2021-01-23 12:22:37.849823: I tensorflow/core/common_runtime/gpu/gpu_device.cc:911] Device interconnect StreamExecutor with strength 1 edge matrix: 2021-01-23 12:22:37.849883: I tensorflow/core/common_runtime/gpu/gpu_device.cc:917] 0 2021-01-23 12:22:37.849897: I tensorflow/core/common_runtime/gpu/gpu_device.cc:930] 0: N 2021-01-23 12:22:37.850003: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1041] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 7982 MB memory) -> physical GPU (device: 0, name: GeForce RTX 2070, pci bus id: 0000:03:00.0, compute capability: 7.5) 2021-01-23 12:22:37.850882: E tensorflow/stream_executor/cuda/cuda_driver.cc:936] failed to allocate 7.79G (8370061312 bytes) from device: CUDA_ERROR_OUT_OF_MEMORY 2021-01-23 12:22:39.624162: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1423] Adding visible gpu devices: 0 2021-01-23 12:22:39.624230: I tensorflow/core/common_runtime/gpu/gpu_device.cc:911] Device interconnect StreamExecutor with strength 1 edge matrix: 2021-01-23 12:22:39.624272: I tensorflow/core/common_runtime/gpu/gpu_device.cc:917] 0 2021-01-23 12:22:39.624305: I tensorflow/core/common_runtime/gpu/gpu_device.cc:930] 0: N 2021-01-23 12:22:39.624400: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1041] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 7982 MB memory) -> physical GPU (device: 0, name: GeForce RTX 2070, pci bus id: 0000:03:00.0, compute capability: 7.5) Model directory: 20180408-102900 Metagraph file: model-20180408-102900.meta Checkpoint file: model-20180408-102900.ckpt-90 Images: 0: /mnt/asep/photo-op/facecompare/face1.png 1: /mnt/asep/photo-op/facecompare/face2.png Distance matrix 0 1 0 0.0000 0.6990 1 0.6990 0.0000
解决权限问题
参考来源:https://blog.csdn.net/easylife206/article/details/103750309
比较简单的,在container里面新建GID和UID都和TrueNAS一样的用户和用户组就可以了,Dockerfile如下:
ARG USER_ID=1004 # 设置一个变量 ARG GROUP_ID=1003 RUN groupadd -g ${GROUP_ID} asep # 执行这个命令,就是Linux的标准指令 RUN useradd -u ${USER_ID} asep -g asep RUN usermod -G sudo asep USER asep # 以下使用asep账户 WORKDIR /home/asep # 以下使用这个路径来操作(也就是打开docker的bash的默认路径)
完整Dockerfile文件
FROM tensorflow/tensorflow:1.7.0-GPU-py3 RUN apt update RUN apt install vim -y RUN apt install libgl1-mesa-glx -y RUN apt install net-tools RUN apt install wget RUN pip install --upgrade pip RUN pip install opencv-python RUN pip install django==2.2 EXPOSE 8000 # 解决中文乱码 ENV LANG C.UTF-8 # 解决权限问题 ARG USER_ID=1004 ARG GROUP_ID=1003 RUN groupadd -g ${GROUP_ID} asep RUN useradd -u ${USER_ID} asep -g asep RUN usermod -G sudo asep USER asep WORKDIR /home/asep # 复制CUDNN文件 USER root COPY ./cuda/include/cudnn.h /usr/local/cuda/include/ COPY ./cuda/lib64/libcudnn* /usr/local/cuda/lib64/ RUN chmod a+r /usr/local/cuda/include/cudnn.h RUN chmod a+r /usr/local/cuda/lib64/libcudnn*
同时需要在Dockerfile同级放置cudnn解压出来的cuda文件夹
使用docker build .,然后docker container run即可
docker container run -it --runtime=nvidia \ -v /home/vision/undergraduate/:/mnt \ -p 10022:22 \ -p 18888:8888 \ --dns 8.8.8.8 \ --privileged \ tensorflow/tensorflow:1.7.0-gpu-py3 bash0
重新训练模型:
docker container run -it --runtime=nvidia \ -v /home/vision/undergraduate/:/mnt \ -p 10022:22 \ -p 18888:8888 \ --dns 8.8.8.8 \ --privileged \ tensorflow/tensorflow:1.7.0-gpu-py3 bash1
同时,这份Dockerfile文件已经上传github:https://github.com/Vision0220/FaceNet-Docker
Docker image也已经上传docker hub:https://hub.docker.com/r/vision20/facenet-gpu
点击链接加入群聊四群:722808830
点击链接加入群聊三群:751529538(已满)
点击链接加入群聊二群:376877156(已满)
点击链接加入群聊一群:622891808(已满)
饿了么红包
本站附件分享,如果附件失效,可以去找找看