Arpit,

For object classification we pretrain the convolutional layers on the ImageNet classification task at half the resolution which is 224 × 224 input image. Classification is a simple problem compared to object detection

For object detection we double the resolution which is 448 x 448 . Detection requires fine-grained visual information and that is the reason increasing the input resolution of the network from 224 × 224 to 448 × 448 helps increase the accuracy for object detection

see the network architecture below

Image for post
Image for post

Please let me know if the explanation helps

Thanks,

Renu

Loves learning, sharing, and discovering myself. Passionate about Machine Learning and Deep Learning

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store