code is
import os
os.environ["KERAS_BACKEND"] = "torch"
import torch
from keras_hub.models import Qwen3Backbone
model_name="Qwen/Qwen3-8B"
model = Qwen3Backbone.from_preset("modelscope://" + model_name)
Next
Why does Model.quantize () increase video memory usage, and is this the correct behavior?
Comment From: sonali-kumari1
Hi @pass-lin -
Thanks for reporting this. I have tested this on qwen2-0.5b-instruct
model using torch backend and observed the same behavior. Calling model.quantize()
increases the memory usage.
There is a related PR #19954 that addresses a similar issue: high peak in GPU memory usage when calling quantize
, with the root cause identified as keras.quantizers.abs_max_quantize
. The proposed fix was to use numpy ops for quantize
which reduced the peak GPU memory requirement in both JAX & Tensorflow backends. However, Torch backend still exhibits peak memory usage. Attaching gist for your reference.