GCC version: (Ubuntu 7.5.0-3ubuntu1~18.04) 7.5.0ĬuDNN version: Probably one of the following:
I believe that the legendary would be able to help have taken the solution pip install torch=1.8.0+cu111 torchvision=0.9.0+cu111 torchaudio=0.8.0 -f īut there is the same error RuntimeError: cuDNN error: CUDNN_STATUS_NOT_INITIALIZEDįollowing is the result by run python -m _env Collecting environment information. I’m working on a remote server and it is the first time I’m using GPUs.
I suppose the problem comes from the fact that I’m trying to use torch with CUDA 11.1 with an incompatible Driver Version, but I don’t know how to fix the problem. | Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M.
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. When I run the command line nvidia-smi, I get: Sun Mar 28 15:02:53 2021 But unfortunately it gives me a new error related to CUDA: RuntimeError: CUDA driver initialization failed, you might not have a CUDA gpu. I was facing the same issue and installing torch with CUDA11.1 solved it. RuntimeError: cuDNN error: CUDNN_STATUS_NOT_INITIALIZED Return self._conv_forward(input, self.weight, self.bias)įile "/home/khawar/anaconda3/envs/hypergraph_reid/lib/python3.7/site-packages/torch/nn/modules/conv.py", line 396, in _conv_forward Return self.module(*inputs, **kwargs)įile "/media/khawar/HDD_Khawar1/hypergraph_reid/models/ResNet_hypergraphsage_part.py", line 621, in forwardįile "/media/khawar/HDD_Khawar1/hypergraph_reid/models/resnet.py", line 213, in forwardįile "/home/khawar/anaconda3/envs/hypergraph_reid/lib/python3.7/site-packages/torch/nn/modules/conv.py", line 399, in forward Train(model, criterion_xent, criterion_htri, optimizer, trainloader, use_gpu)įile "main_video_person_reid_hypergraphsage_part.py", line 257, in trainįile "/home/khawar/anaconda3/envs/hypergraph_reid/lib/python3.7/site-packages/torch/nn/modules/module.py", line 889, in _call_implįile "/home/khawar/anaconda3/envs/hypergraph_reid/lib/python3.7/site-packages/torch/nn/parallel/data_parallel.py", line 165, in forward Initializing model: resnet50graphpoolparthyperįile "main_video_person_reid_hypergraphsage_part.py", line 357, in įile "main_video_person_reid_hypergraphsage_part.py", line 220, in main Number of images per tracklet: 2 ~ 920, average 59.5
Many developers said that this is a label problem but I am not not sure because labels are in right place Traceback (most recent call last):Īrgs:Namespace(arch='resnet50graphpoolparthyper', concat=False, dataset='mars', dropout=0.1, eval_step=100, evaluate=False, gamma=0.1, gpu_devices='0', height=256, htri_only=False, lr=0.0003, margin=0.3, max_epoch=800, nheads=8, nhid=512, num_instances=4, part1=4, part2=8, part3=2, pool='avg', pretrained_model='/home/jiyang/Workspace/Works/video-person-reid/3dconv-person-reid/pretrained_models/resnet-50-kinetics.pth', print_freq=80, save_dir='log_hypergraphsagepart', seed=1, seq_len=8, start_epoch=0, stepsize=200, test_batch=1, train_batch=32, use_cpu=False, warmup=True, weight_decay=0.0005, width=128, workers=4, xent_only=False) I am doing training and put the dataset inside the data folder.