Training Failed on Model and Stalls in Generating Images

Hi Support,

I had issues with my model that I update each month,

Training Failed -This training job did not complete successfully. This can happen for a few reasons but often means that the chosen model dimension (which corresponds to image size) was too large to fit into GPU memory.

This is fairly small model that has never given me a problem.

Also, today I attempted to retrain again, and now it has been stuck on “The images for your new dataset version are now being created. This may take a few moments as machines spin up to process all of the images.”

V11 is the model that failed, V12 is the one that is stuck on generating images. You have support guest access to my account.

Thanks in advance.

We are also getting these issues. We’ve tried reducing the image size to 512x512 but training is still failing, forcing us to generate a new version and use credits. (Roboflow 2.0 Semantic Segmentation)

Edit: When generating a version using identical settings to the previous one, training still fails

Support, can you let us know when its safe to proceed with retraining.

Seems we are all set, i was able to retrain without issue.

We still aren’t able to get through training without it failing. Could we possibly get an update on this?