详情见链接
RuntimeError: received 0 items of ancdata错误是在dataloader加载数据时出现的错误,原因是pytorch多线程共享tensor是通过打开文件的方式实现的,而打开文件的数量是有限制的,通过
ulimit -a core file size (blocks, -c) 0 data seg size (kbytes, -d) unlimited scheduling priority (-e) 0 file size (blocks, -f) unlimited pending signals (-i) 128088 max locked memory (kbytes, -l) 16384 max memory size (kbytes, -m) unlimited open files (-n) 1024 pipe size (512 bytes, -p) 8 POSIX message queues (bytes, -q) 819200 real-time priority (-r) 0 stack size (kbytes, -s) 8192 cpu time (seconds, -t) unlimited max user processes (-u) 128088 virtual memory (kbytes, -v) unlimited file locks (-x) unlimited可查看,当需共享的tensor超过open files限制时,即会出现该错误。
解决办法有2种:
1、增加open files的限制数量:不能用sudo ulimit -n命令,而需执行:
sudo sh -c "ulimit -n 65535 && exec su $LOGNAME" 2、修改多线程的tensor方式为file_system(默认方式为file_descriptor,受限于open files数量): torch.multiprocessing.set_sharing_strategy('file_system')