I am trying to run the example from https://github.com/fchollet/deep-learning-with-python-notebooks/blob/master/chapter05_fundamentals-of-ml.ipynb which illustrates what happens in training when the learning rate is set too high. The code I'm running is:
import os
os.environ["KERAS_BACKEND"] = "tensorflow"
import keras
(train_images, train_labels), _ = keras.datasets.mnist.load_data()
train_images = train_images.reshape((60000, 28 * 28))
train_images = train_images.astype("float32") / 255
model = keras.Sequential([
keras.layers.Dense(512, activation="relu"),
keras.layers.Dense(10, activation="softmax")
])
model.compile(optimizer=keras.optimizers.RMSprop(learning_rate=1.),
loss="sparse_categorical_crossentropy",
metrics=["accuracy"])
hist = model.fit(train_images, train_labels,
epochs=5,
batch_size=128,
validation_split=0.2)
I am running this on a M2 MacBook Pro with tensorflow-macos 2.16.2 and tensorflow-metal 1.2.0.
When I run this on keras version 3.6 or lower, I get results as expected, this model doesn't learn because the learning rate is too high.
Epoch 5/5 375/375 ━━━━━━━━━━━━━━━━━━━━ 2s 6ms/step - accuracy: 0.0985 - loss: 14.5301 - val_accuracy: 0.0995 - val_loss: 14.5143
When I run this on keras version 3.7 or higher, I get completely different results:
Epoch 5/5 375/375 ━━━━━━━━━━━━━━━━━━━━ 2s 5ms/step - accuracy: 0.8600 - loss: 45469.0508 - val_accuracy: 0.8462 - val_loss: 67246.3125
Note the loss value is several orders of magnitude higher, and the validation accuracy is not something you would expect with this learning rate. In fact you can set the learning rate to 1000000 and still reach 85% accuracy:
Epoch 5/5 375/375 ━━━━━━━━━━━━━━━━━━━━ 2s 5ms/step - accuracy: 0.8567 - loss: 46771927565467648.0000 - val_accuracy: 0.8575 - val_loss: 56484175766618112.0000
Running this on colab (keras 3.11) does not have this issue, results are consistent with keras 3.6 run locally.
Switching the backend to "torch" resolves this issue.