Hi Roboflow team,
I think the ability to sort a dataset by filename is a very useful feature to have for general-purpose dataset operations.
As my images are named with datetime format (YYYY-mm-ddTHHMMSS.jpg), it would be very difficult to search for a particular filename currently as they are all mixed in randomly with an existing dataset.
Hi Roboflow team,
In the Dataset tab under the “Generate New Version” button we provide the ability to view your images in a list format. From there you can use "control + f " to search for a specific image. I hope this helps!
Thanks for your reply.
I am aware of those two buttons, and Ctrl+F is what I did to check on specific images. However, as explained in my post, the ability to sort a dataset by filename is a very useful feature to have for general-purpose dataset operations. The goal of this platform is to streamline the workflow, and as a user, the ability to sort by filename is one that achieves this goal.
Your colleague Joseph supported this idea too.
Thanks for posting. This is definitely on the radar! Right now we don’t have a great way to do it behind the scenes, unfortunately. But we are working on adding infra to support it.
Could you describe further what use-cases this would unlock for you?
Thanks for considering it.
Here is a description of a potential use case. For example, if I label images with datetime format (YYYY-mm-ddTHHMMSS.jpg), it would be very difficult to cycle through the latest images using the current system. Let’s say I need to correct the annotations of the latest batch of images (latest datetimes), the ability to sort by filename would just allow me to seamlessly cycle through the new images using the left and right arrow keys (something not possible at all with how the dataset is ordered now). In fact, how are the images sorted currently as it seems very random?
I guess I was hoping to get one or two more levels of abstraction. Let’s say you had them in that order and were flipping through them, what would you be hoping to do/see/understand? And let’s say you gleaned that information from the images you saw in that order, what action would you take based on it?
For all intents and purposes you can consider them randomly sorted. They’re sorted on the backend in an opaque way that makes our database and compute infrastructure scale better.
The following is my point of view for why sorting is needed:
I am hoping to correct annotations of new images based on the current state of the model. Now it is not possible because the new images with predicted bounding boxes gets mixed in randomly with the existing dataset.
See and understand:
I would like to be able to see if there are any data drift or concept drift as I add new data to the dataset, something which is highly stressed in Deeplearning.ai MLOps course. Being able to order the dataset by datetime (or by filename which contains datetime) would enable this to some extent.
Let’s say we do see an obvious change in the new batch of data compared to the existing dataset. Then, we would want to review the data collection steps/ equipment to understand what has changed or decide whether new labels need to be added or the scope of the dataset needs to be redefined.
Hope that answers your question as to why the ability to sort would be so useful.
Super helpful. Thank you for providing that additional context.
I’m discussing with the product team to make sure this is captured in the features we’re currently working on.
annotations of new images based on the current state of the model
Are these images and annotations you’re adding via the web interface or via the API?
Great I look forward to hearing the update!
These images and annotations are currently being added via the web interface (in the upload page of my workspace).
Got it. One workaround for now is looking at the “dataset” column on the Annotate tab which keeps images grouped by job.
Not ideal but should hopefully give you slightly better grouping ability than looking at the whole dataset together.
Thanks Brad! So the problem with uploading new images with annotations is that they are not being treated as a new job. If they are unannotated then it is a new job. So I actually have no way of editing the labels of the new batch of data separately. This is what I experienced when I clicked ‘Add Images’. It just adds the new data to the current labelled dataset. Please correct me if I’m doing something wrong.
It should actually create a job behind the scenes that groups those images together. You can give it a name during the upload process if it helps keep things organized (by default it’s the timestamp you uploaded at).