Keras Training performance degradation after switching from Keras 2 mode to Keras 3 using Tensorflow

I've been working on upgrading my Keras 2 code to just work with Keras 3 without going fully back-end agnostic. However, while everything works fine after resolving compatibility, my training speed has severely degraded by maybe even a factor 10. I've changed the following to get Keras 3 working:

Changed tensorflow.keras to keras calls.
Updated model/weights saving and loading to use the new export function and weights.h5 format.
Updated a callback at the end of the epoch to be a keras.Callback instead of the old BaseLogger.
Added @keras.saving.register_keras_serializable() to custom metric and loss functions.
Updated my online dataset generator to use keras.Sequential data augmentation instead of the removed ImageDataGenerator.
Removed the max_queue_size kwarg from the model.fit and model.predict calls since it has been removed.

In terms of hardware/packages, I'm using Python 3.11.10, keras 3.5.0 and Tensorflow 2.16.2 on a Macbook Pro M2. I've also noticed that my GPU and CPU usage is much higher while running the newer version. I've confirmed using git stash that specifically the changes mentioned above are causing the performance degradation. My suspicion is that the Apple hardware is somehow resulting in worse performance, but I've yet to confirm it using a regular x86 machine.

Comment From: fchollet

Updated my online dataset generator to use keras.Sequential data augmentation instead of the removed ImageDataGenerator.

Are you using tf.data? That's what you want to use to see good performance with TF.

You also want to make sure that you're using the GPU on your machine.

Comment From: DavidHidde

Hi,

Currently the code uses custom generators, but when I rewrite it to use tf.data.Dataset the performance stays the same. In terms of CPU/GPU usage, my system does not show any major differences:

Keras 3 legacy mode:

Screenshot 2024-09-30 at 14 37 40

Keras 3:

Screenshot 2024-09-30 at 14 35 00

Comment From: DavidHidde

Any updates on this?

Comment From: DavidHidde

Did some more digging since I'm starting to have a use for the standalone Keras. When ran in Colab, there is a performance difference but not one is major as running it locally on my own machine. Colab sees about a ~30% difference in execution time (which might be partially due to Colab run-to-run variance) while my MacBook has a 130% difference while running on MPS. PyCharm's profiler did not really help in explaining the difference in execution time:

I've opened a PR in my own repo to show the specific changes I'm making to this model. Hopefully this info helps a bit in resolving this case.

Comment From: Morisset

I had the same problem with a very very simple model of 2 tanh perceptrons (!), fitting a cos(x) function based on 30 noisy points. It used to fit in 10 secs doing 15,000 epochs. Updating Keras to 3.x changed this into 100 to 200 secs! I used ds = tensorflow.data.Dataset.from_tensor_slices((X_train,y_train_true)).batch(30) and model.fit(ds, epochs=15000) and the fitting time decreases to 25 secs, better but still quite bad compared to the previous performances (not mentioning scikit-learn fitting in 3 secs...).

Comment From: sonali-kumari1

Hi @DavidHidde -

Could you please provide a minimal reproducible code to help reproduce the training performance degradation after switching to Keras 3 ? Thanks!

Comment From: github-actions[bot]

This issue is stale because it has been open for 14 days with no activity. It will be closed if no further activity occurs. Thank you.

Comment From: github-actions[bot]

This issue was closed because it has been inactive for 28 days. Please reopen if you'd like to work on this further.

Comment From: google-ml-butler[bot]

Are you satisfied with the resolution of your issue? Yes No