Too many False Positives on video

The model works quite good on images, but fails to detect boxers on video: the video is all covered with boxer frames, though only two of them are there. It persists even if I enlarge dataset. What can I do to overcome this problem?