Model trained via API refers to non-existent class labels

  1. Project type: Object detection
  2. Operating system: Windows 10 Pro v10.0.19044, making API requests in Node v16.13.1

The problem:

I’ve trained a model using Roboflow train via the API (the /train endpoint), which seemed to have worked fine, until testing the model out. The model is trying to detect the following class labels:

dime
nickel
penny
quarter

But the dataset version used to train this model contains only 26 images all of which are labelled ‘holly’. This is the only label used in my dataset, I have no idea where the ‘coin-related’ labels are coming from. The only reasonable explanation I can think of is that the problem is occurring on the API-side, and my project ID has been mixed up with someone else’s?

The project I’m referring to is https://app.roboflow.com/iadt/6439775c84bbfd58df9cb753/7

For context, here’s the code I’ve used to train the model. Please keep in mind that this code works absolutely fine in terms of triggering the training process, it’s the result that’s the issue:

const train = (project, version) => {
  return new Promise(async (resolve, reject) => {

    // first check if already generating a model. if so, do not let the user generate another model - will cause havoc in terms of model accuracy, since it trains the same model twice.
    await getTrainingStatus(project, version)
        .then(async (trainingDetails) => {

        if(Object.keys(trainingDetails.version.model).length){
            // this dataset version already has a model. there can be only 1 per version.
            reject('Please generate a new version to train a model')
        }

          if(!trainingDetails?.version?.generating){
            // this call starts the training process
            await axios.post(`https://api.roboflow.com/iadt/${project}/${version}/train?api_key=${process.env.ROBOFLOW_API_KEY}`)
              .then(async (res) => {
                  let status = "training";

                  setInterval(async () => {
                        if(status === 'training'){
                          // this call checks on the status of the training process
                          await getTrainingStatus(project, version)
                              .then(async (trainingDetails) => {

                                // if res.version.generating = false, training has stopped (I think)
                                if(!trainingDetails?.version?.generating){
                                  // change training status to break out of the interval and resolve
                                  status = "done"

                                  // maybe use websocket here? keep updating the user on the training status
                                }

                              })
                              .catch((e) => {
                                reject('Failed to check training status: ', e)
                              })
                        }
                        else{
                          resolve('Training has finished')
                        }
                  }, 5000)

                })
              .catch((e) => {
                reject('Failed to begin training process: ', e)
              })
          }
          else{
            reject('A version is already generating. Please wait for it to finish before generating another version.')
          }
        })
        .catch((e) => {
          reject("Error while determining training status: ", e)
        })
  });
};

Can you share that project with Roboflow Support? I’m unable to access it to check it out.

Also I’d suggest trying to regenerate the health check as a first step – it recalculates a bunch of values including class counts in case they got out of sync in the backend.

Hi Brad,

Thanks a lot for your reply! I’ve granted access to the project now. I will also regenerate the health check now and share results.

Hello again,

I’ve now granted Roboflow support access to the project. After re-running the health check, all seems well. I assume the support team can see this stuff, but the health check report returned the same values as it did when I ran it previously, at least with regard to the class names, just one, which is ‘holly’ as expected

I should add, the request I’m using to generate the version before training looks like this:

It doesn’t look like the CLI has a method for this, so I just copied it over from the Python SDK

I notice the versions I’m generating this way don’t have any ‘train/test/split’ values, they’re all just going to test. Is it possible to provide those values through the API?

Those values are available via API:

Here’s the code for project.py in the Python SDK for reference:

class Project:
    def __init__(self, api_key, a_project, model_format=None):
        if api_key in DEMO_KEYS:
            self.__api_key = api_key
            self.model_format = model_format
        else:
            self.__api_key = api_key
            self.annotation = a_project["annotation"]
            self.classes = a_project["classes"]
            self.colors = a_project["colors"]
            self.created = datetime.datetime.fromtimestamp(a_project["created"])
            self.id = a_project["id"]
            self.images = a_project["images"]
            self.name = a_project["name"]
            self.public = a_project["public"]
            self.splits = a_project["splits"]
            self.type = a_project["type"]
            self.unannotated = a_project["unannotated"]
            self.updated = datetime.datetime.fromtimestamp(a_project["updated"])
            self.model_format = model_format

            temp = self.id.rsplit("/")
            self.__workspace = temp[0]
            self.__project_name = temp[1]

Thanks Mohamed. I’ve been looking at that project.py file. I am a bit confused by how the Project.generate_version would translate to a regular API call. E.g. here on line 157:

image

The only data being passed in is that settings object, but the comment at the top of the method explains that settings just relates to augmentation and preprocessing:

So how do we pass in the train/test/split values?

For reference, here’s how I’m doing it in javascript:

const generate_version = async (
  project_name,
  settings = {
    augmentation: {},
    preprocessing: {
      "auto-orient": true,
      "filter-null": {
        percent: 100,
      },
      resize: {
        width: 640,
        height: 640,
        format: "Stretch to",
      },
    },
  }
) => {
  return new Promise(async (resolve, reject) => {
    // These settings mirror capabilities available via the Roboflow UI.
    // For example:

    if (
      !Object.keys(settings).includes("augmentation") ||
      !Object.keys(settings).includes("preprocessing")
    ) {
      reject(
        new Error(
          "augmentation and preprocessing keys are required to generate. If none are desired specify empty object associated with that key."
        )
      );
    }

    await axios
      .post(
        `https://api.roboflow.com/iadt/${project_name}/generate?api_key=${process.env.ROBOFLOW_API_KEY}`,
        {
          ...settings,
        }
      )
      .then((res) => {
        if (res.status === 200) {
          resolve(res.data);
        }
      })
      .catch((e) => {
        reject(
          new Error(
            `Error when requesting to generate a new version for project: ${e}`
          )
        );
      });
  });
};

I got the impression from reading version.py in the SDK you shared that it would be possible to pass something like:

splits: { train: 70, test: 10, valid: 20 },

To that object, but it doesn’t seem like it.

Actually self.splits isn’t used anywhere in the Project methods, it’s just declared in the constructor

It isn’t hugely important to me whether I can split the data up anyway, I’m just wondering might it have anything to do with the issue in the title of this post :man_shrugging:

I’m wondering whether I’ve done something stupid in trying to convert over some of the methods from the Python SDK. I’ve mixed and matched different parts from there and the Node CLI so I can have my app handle the entire flow of creating a project, annotating and uploading images, generating a dataset version, and finally training the model, but of course I could’ve messed that up somewhere along the way.

As far as I can tell, my code all works okay, it’s just this final step of training the model that’s going wrong.

I would share my code here, but there is an awful lot of it

  • Accessing values from generated versions and projects, to provide some examples to compare your code against:
from roboflow import Roboflow

rf = Roboflow(api_key='private_api_key')
project = rf.workspace('workspace_id').project('model_id')
version = project.version('version_number')

# storing project metadata
project_metadata = project.get_version_information()

# Accessing the value for project type, e.g `object-detection`:
project_type = project.type

# Accessing your inference endpoint value:
project_endpoint = project_metadata[version_number]['model']['endpoint']

# Accessing your Model ID:
model_id = project_metadata[version_number]['model']['id']

# Accessing your Version Name:
version_name  = project_metadata[version_number]['name']

# Accessing Your Resize Values for a Generated Version:
resize_width = project_metadata[version_number]['preprocessing']['resize']['width']
resize_height = project_metadata[version_number]['preprocessing']['resize']['height']

# Checking whether version was trained from scratch, or from a checkpoint:
if project_metadata[version_number]['model']['fromScratch']:
  train_checkpoint = 'Checkpoint' # version trained from a checkpoint
  print("Version trained from {train_checkpoint}")
elif project_metadata[version_number]['model']['fromScratch'] is False:
  train_checkpoint = 'Scratch'
  print("Version trained from {train_checkpoint}") # version trained from scratch
else:
  train_checkpoint = 'Not Yet Trained'
  train_checkpoint = print("Version trained from {train_checkpoint}") # version not yet trained

# Original Train Set (not-augmented) Image Count:
train_set_orig = project.splits['train']

# Valid Set Image Count:
valid_set_orig = project.splits['valid']

# Test Set Image Count:
test_set_orig = project.splits['test']

# Accessing Your Augmented Train Set Image Count:
train_set_augmented = version.splits['train'])

# Accessing Your mean Average Precision (mAP) metric:
map_score = float(project_metadata[version_number]['model']['map'])

# Accessing Your Precision metric:
precision_metric = float(project_metadata[version_number]['model']['precision'])

# Accessing Your Recall metric:
recall_metric = float(project_metadata[version_number]['model']['recall'])

# Preprocessing Steps Used to Generate Version:
preprocessing_steps = version.preprocessing

# Augmentation Steps Used to Generate Version:
augmentation_steps = version.augmentation

# Calculating the Augmentation Multiple:
augmentation_multiple = version.splits['train'] / project.splits['train']

@jakewarrenblack
split is an API parameter:

The script below works as follows:

The upload_images function takes the directory_path, API key, dataset split, percentage of images to upload, and batch_name as arguments and uploads the specified percentage of images from the directory to the dataset split of choice. The chosen images are randomized, based on the directory of images you point to.

Replace 'path/to/your/image/directory', 'YOUR_API_KEY', and 'YOUR_BATCH_NAME' with the appropriate values.

The example below uploads a random selection of images from the directory of images. The percentage of images it takes is 50%, and uploads to the train set with a batch name of YOUR_BATCH_NAME (since no value was set/changed for batch name).

import os
import random
import requests
import base64
import io
from PIL import Image

def upload_image(image_path: str, api_key: str, project_id: str, split: str, batch_name: str):
    """
    Upload a single image to the Roboflow Upload API with the given parameters.

    Args:
        image_path (str): Path to the image file.
        api_key (str): Roboflow API key.
        project_id (str): Roboflow project ID.
        split (str): Dataset split, can be 'train', 'valid', or 'test'.
        batch_name (str): Batch name for the uploaded images.
    Returns:
        dict: JSON response from the Roboflow API.
    """
    image = Image.open(image_path).convert("RGB")
    buffered = io.BytesIO()
    image.save(buffered, quality=90, format="JPEG")

    img_str = base64.b64encode(buffered.getvalue())
    img_str = img_str.decode("ascii")

    upload_url = "".join([
        f"https://api.roboflow.com/dataset/{project_id}/upload",
        f"?api_key={api_key}",
        f"&name={os.path.basename(image_path)}",
        f"&split={split}",
        f"&batch={batch_name}"
    ])

    r = requests.post(upload_url, data=img_str, headers={
        "Content-Type": "application/x-www-form-urlencoded"
    })

    return r.json()

def get_image_paths(directory: str):
    """
    Get a list of image file paths from a directory.

    Args:
        directory (str): Path to the directory containing images.
    Returns:
        list: A list of image file paths.
    """
    image_extensions = {'.jpeg', '.jpg', '.png'}
    image_paths = []

    for file in os.listdir(directory):
        file_extension = os.path.splitext(file)[1].lower()
        if file_extension in image_extensions:
            image_paths.append(os.path.join(directory, file))

    return image_paths

def upload_images(directory: str, api_key: str, project_id: str, split: str, percentage: int, batch_name: str):
    """
    Upload a specified percentage of images from a directory to a given dataset split.

    Args:
        directory (str): Path to the directory containing images.
        api_key (str): Roboflow API key.
        project_id (str): Roboflow project ID.
        split (str): Dataset split, can be 'train', 'valid', or 'test'.
        percentage (int): The percentage of images to upload (1-100).
        batch_name (str): Batch name for the uploaded images.
    """
    image_paths = get_image_paths(directory)
    num_images_to_upload = int(len(image_paths) * percentage / 100)
    print(f"Uploading {num_images_to_upload} images to the {split} split...")
    sampled_image_paths = random.sample(image_paths, num_images_to_upload)

    for image_path in sampled_image_paths:
        result = upload_image(image_path, api_key, project_id, split, batch_name)
        print(result)


if __name__ == '__main__':
    # Example usage:
    image_directory = 'path/to/your/image/directory'
    api_key = 'YOUR_API_KEY'
    project_id = 'YOUR_PROJECT_ID'
    split = 'train'  # can be 'train', 'valid', or 'test'
    percentage = 50  # value between 1 and 100
    batch_name = 'YOUR_BATCH_NAME'

    print("Uploading images to Roboflow...This may take a few moments.\n")
    print(f"Uploading from directory: {image_directory} | Project ID: {project_id} | Dataset Split for Upload: {split}")
    print(f"Percent of images in the directory to be uploaded to the {split} split: {percentage} | Upload Batch Name: {batch_name}")
    upload_images(image_directory, api_key, project_id, split, percentage, batch_name)
    ## to run the file in your terminal, enter: python3 roboflow_uploadapi_bysplit.py

Thank you Mohamed for your comprehensive answer. I’ll try splitting the images and annotations up into train/test/validate tomorrow, and hopefully that solves the issue of the non-existent class labels. I’ve just tried it again on a completely new project and dataset, and the same issue is occurring. The dataset has only 12 images, all of which are labelled ‘lola’.

image

Can you make any suggestion as to what could be causing the model to ignore my classnames and seemingly make up its own?

Class balance from health check:

If you have a look at the images in the dataset, they all appear to be annotated just fine. Even if I download them and have a look at the XML annotations, they look good.

Maybe I’m still misunderstanding how this is meant to work?

Just to try changing the split value, I passed split: valid directly into the params, like in the JS example here

While my own code is like this:

    const response = await axios({
      method: "POST",
      url: `https://api.roboflow.com/dataset/${projectUrl}/upload`,
      params: {
        api_key: process.env.ROBOFLOW_API_KEY,
        split: "valid",
      },
      data: formData,
      headers: formData.getHeaders(),
    });

All 12 images still just go into train

It’s also the same result if I just append it to the formData:

  formData.append("split", "valid");

It worked for me:

I believe there is a bug in the project itself, or workspace.

Can you try the script with a project that has a title that is not full of numeric values, such as “test upload”?

I’m interested to see if you get the incorrect results with that test, too. This will help me to confirm if it is a project-level bug, a workspace-level bug, or something else entirely.

1 Like

Sorry, it worked after I cleared out all the existing images and uploaded again. Now I’m just working on getting something similar to your Python example :slight_smile:

1 Like

Awesome! Glad its working for upload now!

Went with this:

      for (let i = 0; i < files.length; i += 2) {
        percentUploaded = (numUploaded / (files.length / 2)) * 100;

        const imageFilePath = path.join(directoryPath, files[i]);
        const annotationFilePath = path.join(directoryPath, files[i + 1]);

        if (percentUploaded < trainSplit * 100) {
          split = "train";
        } else if (percentUploaded < (trainSplit + validSplit) * 100) {
          split = "valid";
        } else {
          split = "test";
        }

        await uploadWithAnnotation(
          imageFilePath,
          annotationFilePath,
          req.user._id, // user's unique _id is also used as their project name (users only need one)
          process.env.ROBOFLOW_API_KEY,
          {
            split,
          }
        )

Which looks to have worked. Now to find out if that was the reason for the random class names :crossed_fingers: