Docker roboflow/roboflow-inference-server-cpu leads to OOM after processing a lot inferences

I prepared a docker image from roboflow/roboflow-inference-server-cpu in my server Ubuntu 20.04.3 and docker 24.0.6

The comand to start was
docker run --name sofia-docker --net=host --mount source=roboflow,target=/cache roboflow/roboflow-inference-server-cpu
and when I test, its work fine, but after multiple request (around 10 thousands), this specific imagen (robloflow) increase memory and disk usage until out of memory and if I deleted that docker, then free memory’s.

It happen only when process a lot inferences.

When i search information, only I founded about uvicorn hangs and consumes a lot of memory but this is inside of docker roboflow/roboflow-inference-server-cpu

I apreciate any help with this because I cannot restart constantly docker just for solve that

Hi @gm_eco

I dont have a lot of experience with docker but I do have experience with resource heavy applications, and your problem sounds very familiar to me. It was mostly just the limitations of the resources of my computer. I had to upgrade my ram, processing power and GPU for scaling to the project.

Again, I cant say for sure but it really sounds like docker is crashing due to a lack of resources causing overload.

Hi @gm_eco ! Recent version of our inference package have some new caching logic which could be causing a problem here. Thanks for posting here! I’ll try to reproduce the issue and post back here with some answers.

Hi @happypentester

Thanks for reply, Thas was my firts idea but in roboflow’s blog they said “The minimum prerequisite for running a Roboflow Inference Server is to have Docker installed on your target device.” (Launch: Updated Roboflow Inference Server) and I’m not found any other information about minimum requirements.

I cannot upgrade my server to increase ram and disk for now, until if is necessary. First I will try memory and disk with limit to docker with options --memory=“30g” --storage-opt size=15G and I wil see what happen.

Hi @Paul , thanks for reply. Ok.
If you need any other information to reproduce this issue let me know and I will send you any information

Hi @gm_eco ! I haven’t been able to pinpoint the exact cause of any memory leak nor have I been able to reproduce it with a simple inference setup (hitting my locally running server with an infinite loop). Can you provide more details on your setup? What model(s) are you using? What does your client code look like? Does your client make concurrent requests, or are they serial?

In the meantime, I’ve just published a new release 0.8.9 that includes the ability to disable the new caching logic by setting the environment variable DISABLE_INFERENCE_CACHE=true on your docker run command. We’ll continue trying to find any memory leaks but let me know if this new setting solves your problem as that will tell us we are on the right track with our debugging.

Hi @Paul , thanks to reply.

My setup is like your, I coded a loop with posting to docker in python (like docs CPU - Roboflow Docs), in the POST (made with request module in python). I passed image and for every post I added a sleep 300 ms, specific code I used in loop is like next.

    def pred_request(path_img,confidence,overlap,model_version,model_proyect):
        url        = "http://localhost:9001"
        api_key = "yourkey"
        url = f"{url}/{model_proyect}/{model_version}"

        params = {
            "api_key": api_key,
            "confidence": confidence,
        files = {'file': open(path_img, 'rb')}
        res =, params=params,files=files)
        return res.json()


#extract frame from video to send 
result =[];
for frame in video:
    result.append(pred_request(frame, confidence, overlap,model_version,model_proyect))

#process result 

All Request is serial (after one request is finished, I sent another request)
I used only one custom model for detect object (electrical object like insulators, cut strands etc)

I downloaded the new img and I tried with the new parameter like

sudo docker run --name sofia-docker --net=host --mount source=roboflow,target=/cache roboflow/roboflow-inference-server-cpu --memory="30g" --storage-opt size=15G -e DISABLE_INFERENCE_CACHE=true

and disk space not increasing like before, but ram memory used still increase and is not free up after (like before at the start of discuss).

In the server, I noticed only a few process relationated with uvicorn, this eat memory ram constantly and after 3 hours (imgs attached), the memory use increase from 2G to 8G and only that process relationated with docker (cause if I´m going to stop the docker, then its memory is free). Last number of POST request sended was around 12 thousand in that 3 hours

Hi @gm_eco , I was able to reproduce the bug and a fix will be included int he next release. Thanks for all the info you provided!

Hi @Paul , that is great :smiley:
Thanks you and all roboflow team

I’m waiting the next release for testing :smiley:

This topic was automatically closed 10 days after the last reply. New replies are no longer allowed.