--- title: "[筆記] 在ubuntu 18.04 下安裝nvidia 顯示卡驅動程式以及 pgstrom / Install Nvidia Driver Cuda Pgstrom in Ubuntu 1804" date: 2019-08-20T14:51:54+08:00 noSummary: false featuredImage: "https://h.cowbay.org/images/post-default-7.jpg" categories: ['筆記'] tags: ['nvidia'] author: "Eric Chang" keywords: - nvidia - cuda - pgstrom --- 因為老闆說要試試看用GPU 來跑postgresql 威力 手邊剛好有一張 geforce gt 720 一開始沒想太多,看到有這張卡的驅動程式,然後CUDA也有支援 就直接從桌機拔下來,接去LAB Server ,然後就開始一連串的難關了... 整個過程大致上分為四個步驟 ### 安裝 nvidia driver ### 安裝 CUDA ### 安裝 postgresql ### 安裝 pgstrom ************ #### 安裝 nvidia driver 試過幾種方法,最後還是覺得用apt來安裝比較妥當 先新增repository、update、裝driver ``` sudo add-apt-repository ppa:graphics-drivers/ppa sudo apt update sudo apt install ubuntu-drivers-common ``` 然後用這個指令 ``` ubuntu-drivers devices ``` 看一下現在的驅動程式狀態 ``` administrator@hqdc032:~/pg-strom$ ubuntu-drivers devices == /sys/devices/pci0000:00/0000:00:01.0/0000:01:00.0 == modalias : pci:v000010DEd00001288sv0000174Bsd0000326Bbc03sc00i00 vendor : NVIDIA Corporation model : GK208B [GeForce GT 720] driver : nvidia-driver-410 - third-party free driver : nvidia-driver-418 - third-party free driver : nvidia-340 - distro non-free driver : nvidia-driver-430 - third-party free recommended driver : nvidia-driver-390 - third-party free driver : nvidia-driver-415 - third-party free driver : xserver-xorg-video-nouveau - distro free builtin ``` 我這張卡,可以裝到 430 的版本, 接下來就繼續安裝驅動程式、裝完之後重開機 ``` sudo apt install nvidia-driver-430 sudo reboot ``` 這時候,應該可以看到一些基本資訊 ``` lsmod|grep nvidia ``` 大概長這樣 ``` nvidia_uvm 798720 0 nvidia_drm 45056 3 nvidia_modeset 1093632 7 nvidia_drm nvidia 18194432 258 nvidia_uvm,nvidia_modeset drm_kms_helper 172032 1 nvidia_drm drm 401408 6 drm_kms_helper,nvidia_drm ipmi_msghandler 53248 2 ipmi_devintf,nvidia ``` 到這邊 nvidia 驅動程式安裝完成,接下來安裝 cuda #### 安裝 CUDA 同樣採用官網下載deb 回來安裝的方法 到這邊 https://developer.nvidia.com/cuda-downloads 依照自己的系統選擇 我選擇 Linux -- x86_64 -- Ubuntu -- 18.04 -- deb(local) 畫面上就會有安裝步驟,照著做就沒問題了 ``` wget https://developer.download.nvidia.com/compute/cuda/repos/ubuntu1804/x86_64/cuda-ubuntu1804.pin sudo mv cuda-ubuntu1804.pin /etc/apt/preferences.d/cuda-repository-pin-600 wget http://developer.download.nvidia.com/compute/cuda/10.1/Prod/local_installers/cuda-repo-ubuntu1804-10-1-local-10.1.243-418.87.00_1.0-1_amd64.deb sudo dpkg -i cuda-repo-ubuntu1804-10-1-local-10.1.243-418.87.00_1.0-1_amd64.deb sudo apt-key add /var/cuda-repo-10-1-local-10.1.243-418.87.00/7fa2af80.pub sudo apt-get update sudo apt-get -y install cuda ``` 一樣,安裝完成後重新開機,然後來編譯一個 deviceQuery 的小程式來看看資訊 ``` cd /usr/local/cuda-10.1/samples/1_Utilities/deviceQuery sudo make ``` 會產生一個叫 deviceQuery 的執行檔,執行後,會有相關資訊 ``` administrator@hqdc032:/usr/local/cuda-10.1/samples/1_Utilities/deviceQuery$ ./deviceQuery ./deviceQuery Starting... CUDA Device Query (Runtime API) version (CUDART static linking) Detected 1 CUDA Capable device(s) Device 0: "GeForce GT 720" CUDA Driver Version / Runtime Version 10.1 / 10.1 CUDA Capability Major/Minor version number: 3.5 Total amount of global memory: 1996 MBytes (2093416448 bytes) ( 1) Multiprocessors, (192) CUDA Cores/MP: 192 CUDA Cores GPU Max Clock rate: 797 MHz (0.80 GHz) Memory Clock rate: 900 Mhz Memory Bus Width: 64-bit L2 Cache Size: 524288 bytes Maximum Texture Dimension Size (x,y,z) 1D=(65536), 2D=(65536, 65536), 3D=(4096, 4096, 4096) Maximum Layered 1D Texture Size, (num) layers 1D=(16384), 2048 layers Maximum Layered 2D Texture Size, (num) layers 2D=(16384, 16384), 2048 layers Total amount of constant memory: 65536 bytes Total amount of shared memory per block: 49152 bytes Total number of registers available per block: 65536 Warp size: 32 Maximum number of threads per multiprocessor: 2048 Maximum number of threads per block: 1024 Max dimension size of a thread block (x,y,z): (1024, 1024, 64) Max dimension size of a grid size (x,y,z): (2147483647, 65535, 65535) Maximum memory pitch: 2147483647 bytes Texture alignment: 512 bytes Concurrent copy and kernel execution: Yes with 1 copy engine(s) Run time limit on kernels: Yes Integrated GPU sharing Host Memory: No Support host page-locked memory mapping: Yes Alignment requirement for Surfaces: Yes Device has ECC support: Disabled Device supports Unified Addressing (UVA): Yes Device supports Compute Preemption: No Supports Cooperative Kernel Launch: No Supports MultiDevice Co-op Kernel Launch: No Device PCI Domain ID / Bus ID / location ID: 0 / 1 / 0 Compute Mode: < Default (multiple host threads can use ::cudaSetDevice() with device simultaneously) > deviceQuery, CUDA Driver = CUDART, CUDA Driver Version = 10.1, CUDA Runtime Version = 10.1, NumDevs = 1 Result = PASS ``` 到這邊, CUDA 也安裝完成,再來是簡單的 postgresql 11 #### 安裝 postgresql 11 步驟差不多,就是新增repository,然後選擇要安裝的套件,不過套件的選擇和平常安裝postgresql 不太一樣 ``` sudo apt install wget ca-certificates wget --quiet -O - https://www.postgresql.org/media/keys/ACCC4CF8.asc | sudo apt-key add - sudo sh -c 'echo "deb http://apt.postgresql.org/pub/repos/apt/ `lsb_release -cs`-pgdg main" >> /etc/apt/sources.list.d/pgdg.list' sudo apt update sudo apt install postgresql-client-11 postgresql-11 postgresql-server-dev-11 postgresql-common libicu-dev ``` 跑完之後,就直接啟動 postgresql service 就可以了 再來是最麻煩的 pgstorm #### pgstorm 話說,這軟體到底叫啥名字? pgstrom , pg-strom ? 看起來就像是拼錯字啊! =.= https://github.com/heterodb/pg-strom 先 git clone 回來,然後make、make install 講是很簡單,但是一開始碰到很多問題,有去github 跟開發團隊回報,幸好有回覆我.. 總之,目前在ubuntu 18.04 + postgresql-11 的環境下編譯是沒有問題了 ## UPDATE 今天拿到一張 GTX 1050 ti ,想說終於可以來測試看看 pg_strom 了 不過發現在ubuntu 底下,照著這篇操作還是會有問題 在做完git clone 要 make 之前,要先執行底下兩行指令 其中的 11 是 postgresql 版本,要依照自己安裝的版本做調整 ``` sudo ln -snf /usr/lib/postgresql/11/lib/libpgcommon.a /usr/lib/x86_64-linux-gnu/libpgcommon.a sudo ln -snf /usr/lib/postgresql/11/lib/libpgport.a /usr/lib/x86_64-linux-gnu/libpgport.a ``` 接著再去 make 就沒問題了 ``` git clone https://github.com/heterodb/pg-strom.git cd pg-strom make && sudo make install ``` 再來設定一下 postgresql #### postgresql 相關設定 需要修改一下postgresql.conf,來指定載入 pgstrom 的 library 官方是這麼說的 ``` PG-Strom module must be loaded on startup of the postmaster process by the shared_preload_libraries. Unable to load it on demand. Therefore, you must add the configuration below. ``` 修改 /etc/postgresql/11/main/postgresql.conf 加入一行 ``` shared_preload_libraries = '$libdir/pg_strom' ``` 然後還有其他三個要修改,不過這個不改不會影響pgstrom 的啟動 看狀況要不要修改吧 ``` max_worker_processes = 100 shared_buffers = 10GB work_mem = 1GB ``` 修改完後,重新啟動 postgresql service 有沒有很感動!?我終於可以享受用GPU跑SQL Query 的快感啦!!! 咦??等等,為什麼postgresql service 沒起來!? 看一下 /var/log/syslog ``` Aug 20 14:23:43 hqdc032 postgresql@11-main[11801]: Error: /usr/lib/postgresql/11/bin/pg_ctl /usr/lib/postgresql/11/bin/pg_ctl start -D /var/lib/postgresql/11/main -l /var/log/postgresql/postgresql-11-main.log -s -o -c config_file="/etc/postgresql/11/main/postgresql.conf" exited with status 1: Aug 20 14:23:43 hqdc032 postgresql@11-main[11801]: 2019-08-20 14:23:43.538 CST [11806] LOG: PG-Strom version 2.2 built for PostgreSQL 11 Aug 20 14:23:43 hqdc032 postgresql@11-main[11801]: 2019-08-20 14:23:43.565 CST [11806] LOG: PG-Strom: GPU0 GeForce GT 720 - CC 3.5 is not supported Aug 20 14:23:43 hqdc032 postgresql@11-main[11801]: 2019-08-20 14:23:43.565 CST [11806] FATAL: PG-Strom: no supported GPU devices found Aug 20 14:23:43 hqdc032 postgresql@11-main[11801]: 2019-08-20 14:23:43.565 CST [11806] LOG: database system is shut down Aug 20 14:23:43 hqdc032 postgresql@11-main[11801]: pg_ctl: could not start server Aug 20 14:23:43 hqdc032 postgresql@11-main[11801]: Examine the log output. Aug 20 14:23:43 hqdc032 systemd[1]: postgresql@11-main.service: Can't open PID file /run/postgresql/11-main.pid (yet?) after start: No such file or directory Aug 20 14:23:43 hqdc032 systemd[1]: postgresql@11-main.service: Failed with result 'protocol'. Aug 20 14:23:43 hqdc032 systemd[1]: Failed to start PostgreSQL Cluster 11-main. ``` 啊幹!pg-strom 不支援這張GT 720啦! https://github.com/heterodb/pg-strom/wiki/001:-GPU-Availability-Matrix 簡單說,就是至少從 GTX 1080 起跳,其他都不用想了 幹,花了兩天的時間在弄這東西,結果明明一開始就應該要先檢查的相容列表卻沒有檢查... 好了,現在就看准不准我買一張 GTX 1080 來測試了.... 只是這價格嘛...嗯咳,不是我該煩惱的問題了..