Kaggle IO error when downloading dataset

I’m encountering an IO error when running “save and commit” in a kaggle notebook that occurs when calling .download() on a projectd version

  • Project Type: Object Detection
  • Operating System & Browser: Windows 10 / Chrome

Here’s a screenshot of the error log from kaggle:

I’m wondering if there is any way to toggle verbosity of the download, or somehow direct stdout into something so that it doesn’t error out? If that’s even the root cause of the error.

Hi @Jquinny

Sorry to hear you’re having issues. Let’s try to get you sorted out.

It looks like it timed out while it was downloading a dataset. Can you tell me if the download was extremely slow or stopped before it produced that error?

Could you also share your workspace/project ID? I’d like to try to reproduce the error.
If you are uncomfortable sharing the ID in public, you can private message us that info here or to me directly.

Thanks for the speedy response! I’ve sent a private message to the moderators group with some info for reproducibility.

The dataset download during normal cell execution was a normal download speed I’d say, and from what I can tell it stops after the error is thrown, but I may be wrong.

Let me know if you find anything!

Hi @Jquinny

I’m still looking into your issue. I’m trying to replicate your issue and am unable to at this moment. Is this a consistent issue that occurs? Have you tried on other notebook platforms and if so does the issue still happen?

I made a new kaggle notebook with just purely the code cells I needed to run that little dataset download section and it worked in a “save and commit” run. But then I tried it with a few code cells added afterward and it failed in the dataset download part again. It’s still failing due to the Timeout Waiting for IOPub output.

Running it in any regular cell in either colab or kaggle works fine though, it’s just during the save and commit runs, which are the ones that I actually need working. Odd bug though I must say.

Hi @Jquinny

I’m going to try it now, but at first thought, it seems like a Kaggle issue to me. I’ll look into it and follow up though.

Edit: Could you also confirm that by save and commit you mean the “Save and Run All (Commit)” option under the “Save Version” menu?

Yes by save and commit I mean the “save and run all (commit)” option under the “save version” menu.

And yeah it’s 100% a kaggle issue, they probably limit IO requests during those runs for whatever reason, so this is sort of a feature request almost.

Basically roboflow is sending the % complete update with bytes downloaded A LOT during the dataset download. I think a verbose flag in the .download() argument that, when set to False, will ensure the function only prints some sort of complete or failed at % message when it’s DONE downloading would fix the problem. With the flag set it should refrain from spamming the % updates the whole time.

Hi @Jquinny

I see. Thank you for the clarification. I’ve submitted a pull request to the Python SDK to implement your feature suggestion. Hopefully, it’ll be added soon, which I hope will fix your issues.