Need to build a Jewellery similarity application

I want to build a Jewellery similarity application.
I have around 70k images.

The images will have the following characteristics:

  1. Type - Ring, Bracelet, Earring
  2. Material - Gold, Silver, Titanium
  3. Shape - Round, Heart,etc
  4. Color

Now this is how I believe the system will be -

I will use Resnet50. Will remove the last layer.
The last layer will be as per the 4 characteristics given above.

Am I in the right direction?

Another ask -
Also, for training the images, majority of them have the background white. But some have transparent. I shall convert the transparent ones to white. Right?

Another ask -
Since client wants to run everything on cloud. After training, the inferences’ second last layer embeddings will be saved in a Vector DB like Pinecone.
Am I in the right direction?