1. 시스템 환경 (System Env.)
OS Version : Linux / CentOS 6.7
MPI : OpenMPI-3.1.6
2. 오류 내용 (error)
[$ mpirun -np 2 ./model.x] 입력시 아래와 같은 내용이 출력
When i command [$ mpirun -np 2 ./model.x] on terminal, the outputs below were printed.
-------------------------------------------------------------------------
Failed to create a completion queue (CQ):
Hostname: head
Requested CQE: 16384
Error: Cannot allocate memory
Check the CQE attribute.
--------------------------------------------------------------------------
--------------------------------------------------------------------------
Open MPI has detected that there are UD-capable Verbs devices on your
system, but none of them were able to be setup properly. This may
indicate a problem on this system.
You job will continue, but Open MPI will ignore the "ud" oob component
in this run.
Hostname: head
--------------------------------------------------------------------------
--------------------------------------------------------------------------
No OpenFabrics connection schemes reported that they were able to be
used on a specific port. As such, the openib BTL (OpenFabrics
support) will be disabled for this port.
Local host: head
Local device: mlx4_0
Local port: 1
CPCs attempted: rdmacm, udcm
--------------------------------------------------------------------------
그 후 모델이 구동되기는 한다...
3. 해결 방법 (Solution)
정확한 요인을 잘 모르겠지만, 구글링 해보니 max locked memory 의 문제라는데... [ulimit -a] 명령어의 출력을 보면 아래와 같다.
I don't know what the exact cause is, but when I googled it, it said it was a problem of max locked memory. ulimit -a command leads following outputs
$ ulimit -a
core file size (blocks, -c) 0
data seg size (kbytes, -d) unlimited
scheduling priority (-e) 0
file size (blocks, -f) unlimited
pending signals (-i) 256637
max locked memory (kbytes, -l) 64
max memory size (kbytes, -m) unlimited
open files (-n) 4096
pipe size (512 bytes, -p) 8
POSIX message queues (bytes, -q) 819200
real-time priority (-r) 0
stack size (kbytes, -s) 10240
cpu time (seconds, -t) unlimited
max user processes (-u) 256637
virtual memory (kbytes, -v) unlimited
file locks (-x) unlimited
max locked memory 가 64로 제한이 되어있다.
As you can see, max locked memory is limited to 64.
그래서 [sudo vi /etc/security/limits.conf]에서 아래 중 하나를 추가하면된다는데, 나의 경우 모두 선언 되어있었다.
Many said adding one of the following commands in [sudo vi /etc/security/limits.conf] will solve it, but in my case it was declared already.
user1 hard memlock unlimited
user1 soft memlock unlimited
* hard memlock unlimited
* soft memlock unlimited
그런데..ㅋㅋ root에 진입했다가 다시 내 계정으로 돌아오니까 max locked memory가 unlimited로 바뀌어 있지 않은가..?
I found some ridiculus things here, that was, this problem was solved when I entered root account and returned to my account. the memlock was set as unlimited.
$ ulimit -a
core file size (blocks, -c) 0
data seg size (kbytes, -d) unlimited
scheduling priority (-e) 0
file size (blocks, -f) unlimited
pending signals (-i) 256637
max locked memory (kbytes, -l) unlimited
max memory size (kbytes, -m) unlimited
open files (-n) 4096
pipe size (512 bytes, -p) 8
POSIX message queues (bytes, -q) 819200
real-time priority (-r) 0
stack size (kbytes, -s) unlimited
cpu time (seconds, -t) unlimited
max user processes (-u) 4096
virtual memory (kbytes, -v) unlimited
file locks (-x) unlimited
왜 내 계정은 memlock 이 풀리지 않는지 모르겠으나.. 정확한 원인을 알기 전까지는 일단
before i realize the exact solution, i will use
"root 계정 진입 후 다시 내 계정으로 돌아오기 "
"Entered root account and returned to my account."
를 이용해 사용해야겠다... ㅠㅠ
'Engineer > 프로그래밍관련' 카테고리의 다른 글
파이썬 칫싯, sort + replace + split (0) | 2020.10.25 |
---|---|
CUDA, Tensorflow-gpu 오류 (0) | 2020.07.29 |
파이썬 칫싯, datetime : 임시버전관리, 실행 날짜 및 시간 출력 방법 (0) | 2020.05.27 |
OpenMPI 4.0.X 오류 사항 (0) | 2020.05.18 |
linux에 설치된 Jupyter notebook 원격 접속 설정하기 (0) | 2020.03.20 |
댓글