Python SDK Image Upload stuck using multiprocessing

I wrote a simple script that uploads photos (batched and tagged properly) using the Python SDK.

from roboflow import Roboflow

rf = Roboflow(api_key=API_KEY)

workspaceId = WORKSPACE_ID
projectId = PROJECT_ID
project = rf.workspace(workspaceId).project(projectId)

def upload_image(image_path, batch_name, tag):
    try:
        print(image_path, batch_name, tag)
        project.upload(
            image_path=image_path,
            batch_name=batch_name,
            num_retry_uploads=3,
            tag=tag
        )
        filename = image_path.split('\\')[-1]
        print(filename)
        # move the file to the uploaded images folder
        os.rename(image_path, f"{uploaded_images_path}\\{filename}")
    except Exception as e:
        print("Something went wrong...")
        print(e)

    print("Done")

and it works as expected

for file, batch, tag in args:
    upload_image(file, batch, tag)

However I would like to do multiple uploads simultaneously using multiprocessing to save time on larger datasets.

# DOES NOT WORK
with Pool(processes=multiprocessing.cpu_count() - 1) as pool:
    L = pool.starmap(upload_image, args)
    pool.close()
    pool.join()

This does not work and is stuck indefinitely.

Hi Kevin.
Your multiprocessing code looks correct. I can’t think of a reason for it not to work, especially when the sequential for loop does.

Couple questions:

  • Where does Pool come from? is it from multiprocessing import Pool ?
  • Can you try a fixed amount of workers: with Pool(processes=10) as pool
  • Can you show a little bit of what’s in args?
  • Can you add some print()s before/after pool.starmap and provide the output you get in the console?

Thanks @tonylampada

  • Where does Pool come from? is it from multiprocessing import Pool ?
    Yes
  • Can you try a fixed amount of workers: with Pool(processes=10) as pool’
    Yes - I’ve tried hardcoding
  • Can you show a little bit of what’s in args?
    This is a list of tuples to match the function signature upload_image(image_path, batch_name, tag)
[('C:\\Users\\kevin\\Documents\\IBI Training\\Unsorted\\pID-25968-msid-101460--3cf5b920-ae51-4173-a4c8-985ea9766df1.jpg', '25968', '25968'), ('C:\\Users\\kevin\\Documents\\IBI Training\\Unsorted\\pID-25968-msid-103233--c4f5df3a-28e8-40ee-89a4-c7e8b815ca41.jpg', '25968', '25968')]
Running...
[('C:\\Users\\kevin\\Documents\\IBI Training\\Unsorted\\pID-25968-msid-103233--c4f5df3a-28e8-40ee-89a4-c7e8b815ca41.jpg', '25968', '25968'), ('C:\\Users\\kevin\\Documents\\IBI Training\\Unsorted\\pID-25968-msid-101460--3cf5b920-ae51-4173-a4c8-985ea9766df1.jpg', '25968', '25968')]

This is fine since I use this synchronously.

  • Can you add some print()s before/after pool.starmap and provide the output you get in the console?

Yep - it’s being invoked.

I realised that this is most likely not related to Roboflow API and more on the limitation of multiprocessing.

I’ll try to play around with it more since I really need to upload multiple images simultaneously.

Late follow-up:

You can now do this more easily (no coding needed) by using the python CLI (run roboflow import on the terminal with -c parameter for concurrency).

Documentation: