CLIP: OpenAI's Multi-Modal Model

When you flip through the photos on your mobile, you look at images and then say, "This is a pic with my family last year at Cape Cod where we watched Whales". You use language to describe and classify an image.

Constastive Language -Image Pretraining(CLIP) is a zero-shot multi-modal model that learns directly from the raw text about images. CLIP which efficiently…

--

--

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
Renu Khandelwal

Renu Khandelwal

Loves learning, sharing, and discovering myself. Passionate about Machine Learning and Deep Learning