Selecting multiple filenames on Dataset Split

kapthana · April 20, 2023, 8:23am

Hi, I want to split by data into train/val/test based on name of each file.

How can I enter multiple filenames into the query?
I want to end up with something like this:

filename:Trio | filename:Duo | filename:Uno

Mohamed · April 20, 2023, 5:16pm

Hi @kapthana - we don’t currently have that functionality available in filename search. It works akin to file search on a computer or laptop’s file system search bar.

That is an interesting request, however. Do you find yourself needing to filter like this often? And what is occurring after the selection – is it image-tagging? Or just to more efficiently view all of those images?

A workaround would be if there are common strings within each filename set, you can set the filename search filter that way.

Otherwise, for now, the option would be to run each search and select the appropriate files by clicking and dragging across them, selecting an individual file, or clicking Select All above the top search to select all images present on the current page-view.

kapthana · April 20, 2023, 5:53pm

Thanks for the fast reply.

In my case I’ve included information about the source of each image in the filename. I want to split dataset into train/test based on image source so that no images from the same source would be present in both training and testing sets.

I’ll just do the splitting via outside script.

Mohamed · April 20, 2023, 6:46pm

In that case, you may find this script helpful – it works as follows:

Choose a directory of images + dataset split + batch name + percentage of images in the directory to upload to that split:
The upload_images function takes the directory_path, API key, dataset split, percentage of images to upload, and batch_name as arguments and uploads the specified percentage of images from the directory to the dataset split of choice. The chosen images are randomized, based on the directory of images you point to.
Replace 'path/to/your/image/directory', 'YOUR_API_KEY', and 'YOUR_BATCH_NAME' with the appropriate values.

import os
import random
import requests
import base64
import io
from PIL import Image

def upload_image(image_path: str, api_key: str, project_id: str, split: str, batch_name: str):
    """
    Upload a single image to the Roboflow Upload API with the given parameters.

    Args:
        image_path (str): Path to the image file.
        api_key (str): Roboflow API key.
        project_id (str): Roboflow project ID.
        split (str): Dataset split, can be 'train', 'valid', or 'test'.
        batch_name (str): Batch name for the uploaded images.
    Returns:
        dict: JSON response from the Roboflow API.
    """
    image = Image.open(image_path).convert("RGB")
    buffered = io.BytesIO()
    image.save(buffered, quality=90, format="JPEG")

    img_str = base64.b64encode(buffered.getvalue())
    img_str = img_str.decode("ascii")

    upload_url = "".join([
        f"https://api.roboflow.com/dataset/{project_id}/upload",
        f"?api_key={api_key}",
        f"&name={os.path.basename(image_path)}",
        f"&split={split}",
        f"&batch={batch_name}"
    ])

    r = requests.post(upload_url, data=img_str, headers={
        "Content-Type": "application/x-www-form-urlencoded"
    })

    return r.json()

def get_image_paths(directory: str):
    """
    Get a list of image file paths from a directory.

    Args:
        directory (str): Path to the directory containing images.
    Returns:
        list: A list of image file paths.
    """
    image_extensions = {'.jpeg', '.jpg', '.png'}
    image_paths = []

    for file in os.listdir(directory):
        file_extension = os.path.splitext(file)[1].lower()
        if file_extension in image_extensions:
            image_paths.append(os.path.join(directory, file))

    return image_paths

def upload_images(directory: str, api_key: str, project_id: str, split: str, percentage: int, batch_name: str):
    """
    Upload a specified percentage of images from a directory to a given dataset split.

    Args:
        directory (str): Path to the directory containing images.
        api_key (str): Roboflow API key.
        project_id (str): Roboflow project ID.
        split (str): Dataset split, can be 'train', 'valid', or 'test'.
        percentage (int): The percentage of images to upload (1-100).
        batch_name (str): Batch name for the uploaded images.
    """
    image_paths = get_image_paths(directory)
    num_images_to_upload = int(len(image_paths) * percentage / 100)
    print(f"Uploading {num_images_to_upload} images to the {split} split...")
    sampled_image_paths = random.sample(image_paths, num_images_to_upload)

    for image_path in sampled_image_paths:
        result = upload_image(image_path, api_key, project_id, split, batch_name)
        print(result)


if __name__ == '__main__':
    # Example usage:
    image_directory = 'path/to/your/image/directory'
    api_key = 'YOUR_API_KEY'
    project_id = 'YOUR_PROJECT_ID'
    split = 'train'  # can be 'train', 'valid', or 'test'
    percentage = 50  # value between 1 and 100
    batch_name = 'YOUR_BATCH_NAME'

    print("Uploading images to Roboflow...This may take a few moments.\n")
    print(f"Uploading from directory: {image_directory} | Project ID: {project_id} | Dataset Split for Upload: {split}")
    print(f"Percent of images in the directory to be uploaded to the {split} split: {percentage} | Upload Batch Name: {batch_name}")
    upload_images(image_directory, api_key, project_id, split, percentage, batch_name)

^^ If you name the file upload_by_split, run it with python3 upload_by_split.py after entering the values for the directory of images + dataset split + batch name + percentage of images in the directory to upload to that split

Topic		Replies	Views
Sorting button in the 'Dataset' page Feedback	12	437	February 10, 2022
Ability to mass assign dataset splits Feedback	2	328	September 6, 2023
Selection of multiple images (Workflow) Community Help	2	33	March 24, 2025
Data shuffling Community Help split-after-upload	0	510	July 14, 2022
Split Image between Train, Valid and Test function is not working Community Help	1	249	January 17, 2023

Selecting multiple filenames on Dataset Split

Related topics