I am trying to run the example from https://github.com/fchollet/deep-learning-with-python-notebooks/blob/master/chapter05_fundamentals-of-ml.ipynb which illustrates what happens in training when the learning rate is set too high. The code I'm running is:

import os

os.environ["KERAS_BACKEND"] = "tensorflow"

import keras

(train_images, train_labels), _ = keras.datasets.mnist.load_data()
train_images = train_images.reshape((60000, 28 * 28))
train_images = train_images.astype("float32") / 255 

model = keras.Sequential([
    keras.layers.Dense(512, activation="relu"),
    keras.layers.Dense(10, activation="softmax")
])
model.compile(optimizer=keras.optimizers.RMSprop(learning_rate=1.),
              loss="sparse_categorical_crossentropy",
              metrics=["accuracy"])
hist = model.fit(train_images, train_labels,
          epochs=5,
          batch_size=128,
          validation_split=0.2)

I am running this on a M2 MacBook Pro with tensorflow-macos 2.16.2 and tensorflow-metal 1.2.0.

When I run this on keras version 3.6 or lower, I get results as expected, this model doesn't learn because the learning rate is too high.

Epoch 5/5 375/375 ━━━━━━━━━━━━━━━━━━━━ 2s 6ms/step - accuracy: 0.0985 - loss: 14.5301 - val_accuracy: 0.0995 - val_loss: 14.5143

When I run this on keras version 3.7 or higher, I get completely different results:

Epoch 5/5 375/375 ━━━━━━━━━━━━━━━━━━━━ 2s 5ms/step - accuracy: 0.8600 - loss: 45469.0508 - val_accuracy: 0.8462 - val_loss: 67246.3125

Note the loss value is several orders of magnitude higher, and the validation accuracy is not something you would expect with this learning rate. In fact you can set the learning rate to 1000000 and still reach 85% accuracy:

Epoch 5/5 375/375 ━━━━━━━━━━━━━━━━━━━━ 2s 5ms/step - accuracy: 0.8567 - loss: 46771927565467648.0000 - val_accuracy: 0.8575 - val_loss: 56484175766618112.0000

Running this on colab (keras 3.11) does not have this issue, results are consistent with keras 3.6 run locally.

Switching the backend to "torch" resolves this issue.