Arpit,

For object classification we pretrain the convolutional layers on the ImageNet classification task at half the resolution which is 224 × 224 input image. Classification is a simple problem compared to object detection

For object detection we double the resolution which is 448 x 448 . Detection requires fine-grained visual information and that is the reason increasing the input resolution of the network from 224 × 224 to 448 × 448 helps increase the accuracy for object detection

see the network architecture below

Please let me know if the explanation helps

Thanks,

Renu

Loves learning, sharing, and discovering myself. Passionate about Machine Learning and Deep Learning

Loves learning, sharing, and discovering myself. Passionate about Machine Learning and Deep Learning