Phone Camera (keypoint detection)

Hello,

I am new to computer vision and I am trying to figure things out. Please I have a project that requires the recognition of various parts of a goat. I plan to get enough videos and pictures from goat markets. I have been learning with objects (trousers, spoons, leaves). I noticed that after annotation and trainning, the object of interest is recognized but the parts (keypoints are highly distorted).

1. Please, What can be the issue?
2. Is it okay to use a phone camera (i have been using a 50MP itel p55 camera)?
3. About how much pictures do i need to annotate for the computer to recognize the keypoints?
4. Do i need a plain background with same light condition?