Hi @NgocTrang ,
Iāll try to repro the issue, but I suspect itās because youāre running InferencePipeline, which will use locally installed inference package for inferencing, so it will run on cpu instead of gpu. You should runinference server in docker container (gpu-enabled) and InferenceHttpClient (.run_workflow() or .start_inference_pipeline_with_workflow() ) and point the http client to your server (localhost:9001 by default) thatās running in docker container.
I noticed you shared your API key, I removed that, but you should rotate it (remove old one and create a new one) ASAP.
Thanks, Erik