code is

import os
os.environ["KERAS_BACKEND"] = "torch"
import torch
from keras_hub.models import Qwen3Backbone
model_name="Qwen/Qwen3-8B"
model = Qwen3Backbone.from_preset("modelscope://" + model_name)

Image

Next

Image

Why does Model.quantize () increase video memory usage, and is this the correct behavior?

Comment From: sonali-kumari1

Hi @pass-lin - Thanks for reporting this. I have tested this on qwen2-0.5b-instruct model using torch backend and observed the same behavior. Calling model.quantize() increases the memory usage.

There is a related PR #19954 that addresses a similar issue: high peak in GPU memory usage when calling quantize, with the root cause identified as keras.quantizers.abs_max_quantize. The proposed fix was to use numpy ops for quantize which reduced the peak GPU memory requirement in both JAX & Tensorflow backends. However, Torch backend still exhibits peak memory usage. Attaching gist for your reference.