I wrote a simple script that uploads photos (batched and tagged properly) using the Python SDK.
from roboflow import Roboflow
rf = Roboflow(api_key=API_KEY)
workspaceId = WORKSPACE_ID
projectId = PROJECT_ID
project = rf.workspace(workspaceId).project(projectId)
def upload_image(image_path, batch_name, tag):
try:
print(image_path, batch_name, tag)
project.upload(
image_path=image_path,
batch_name=batch_name,
num_retry_uploads=3,
tag=tag
)
filename = image_path.split('\\')[-1]
print(filename)
# move the file to the uploaded images folder
os.rename(image_path, f"{uploaded_images_path}\\{filename}")
except Exception as e:
print("Something went wrong...")
print(e)
print("Done")
and it works as expected
for file, batch, tag in args:
upload_image(file, batch, tag)
However I would like to do multiple uploads simultaneously using multiprocessing to save time on larger datasets.
# DOES NOT WORK
with Pool(processes=multiprocessing.cpu_count() - 1) as pool:
L = pool.starmap(upload_image, args)
pool.close()
pool.join()
This does not work and is stuck indefinitely.
Hi Kevin.
Your multiprocessing code looks correct. I can’t think of a reason for it not to work, especially when the sequential for loop does.
Couple questions:
- Where does Pool come from? is it
from multiprocessing import Pool
?
- Can you try a fixed amount of workers: with Pool(processes=10) as pool
- Can you show a little bit of what’s in
args
?
- Can you add some print()s before/after pool.starmap and provide the output you get in the console?
Thanks @tonylampada
- Where does Pool come from? is it
from multiprocessing import Pool
?
Yes
- Can you try a fixed amount of workers: with Pool(processes=10) as pool’
Yes - I’ve tried hardcoding
- Can you show a little bit of what’s in
args
?
This is a list of tuples to match the function signature upload_image(image_path, batch_name, tag)
[('C:\\Users\\kevin\\Documents\\IBI Training\\Unsorted\\pID-25968-msid-101460--3cf5b920-ae51-4173-a4c8-985ea9766df1.jpg', '25968', '25968'), ('C:\\Users\\kevin\\Documents\\IBI Training\\Unsorted\\pID-25968-msid-103233--c4f5df3a-28e8-40ee-89a4-c7e8b815ca41.jpg', '25968', '25968')]
Running...
[('C:\\Users\\kevin\\Documents\\IBI Training\\Unsorted\\pID-25968-msid-103233--c4f5df3a-28e8-40ee-89a4-c7e8b815ca41.jpg', '25968', '25968'), ('C:\\Users\\kevin\\Documents\\IBI Training\\Unsorted\\pID-25968-msid-101460--3cf5b920-ae51-4173-a4c8-985ea9766df1.jpg', '25968', '25968')]
This is fine since I use this synchronously.
- Can you add some print()s before/after pool.starmap and provide the output you get in the console?
Yep - it’s being invoked.
I realised that this is most likely not related to Roboflow API and more on the limitation of multiprocessing
.
I’ll try to play around with it more since I really need to upload multiple images simultaneously.
Late follow-up:
You can now do this more easily (no coding needed) by using the python CLI (run roboflow import
on the terminal with -c
parameter for concurrency).
Documentation: