Keras Cannot load weight of pre-trained models

I tried with tensorflow versions 2.19 and 2.18. The code that reproduces the error is:

import tensorflow as tf print(tf.__version__)

model = tf.keras.applications.EfficientNetB7( weights='imagenet', include_top=False, input_shape=(600, 600, 3) ) model.summary()

It raises "ValueError: Shape mismatch in layer #1 (named stem_conv)for weight stem_conv/kernel. Weight expects shape (3, 3, 1, 64). Received saved weight with shape (3, 3, 3, 64)". I tried clean the cache folder to get a fresher download, but the new download also raises the same error. Using include_top=True without input_shape parameter, also raises the same problem. Other model, e.g., EfficientNetB6, raises the same error.

Comment From: dhantule

Hi @h3dema, thanks for reporting this.

I've tested your code with the latest Keras 3.11.0 and Im facing the same error

ValueError: Shape mismatch in layer #1 (named stem_conv)for weight stem_conv/kernel. Weight expects shape (3, 3, 1, 64). Received saved weight with shape (3, 3, 3, 64)

However with Keras 3.8.0 everything seems to be working fine, please refer this attached gist. We'll look into this and update you.

Comment From: shubham-ojha-weheal

I have the same issue but with the B0 model. It says ValueError: Shape mismatch in layer #1 (named stem_conv)for weight stem_conv/kernel. Weight expects shape (3, 3, 1, 32). Received saved weight with shape (3, 3, 3, 32)

I installed TensorFlow version 2.18.1. It installed Keras version 3.11.1 Downgrading the Keras version to 3.8.0 solved the issue for me.

Comment From: k-r-4-m

I had the same issue with B7 and downgraded to keras 3.10.0, works fine now.

Comment From: abheesht17

Did some basic debugging. Clearly, the issue is with the Rescaling layer, because it outputs only 1 channel (instead of 3) with Keras 3.11.1. @sonali-kumari1 - it seems like this PR changed compute_output_shape(...), which results in the error. Could you please take a look? Thanks!

model.summary() with 3.10:

Model: "efficientnet"
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┓
┃ Layer (type)                    ┃ Output Shape           ┃       Param # ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━┩
│ input_layer (InputLayer)        │ (None, 600, 600, 3)    │             0 │
├─────────────────────────────────┼────────────────────────┼───────────────┤
│ rescaling (Rescaling)           │ (None, 600, 600, 3)    │             0 │
├─────────────────────────────────┼────────────────────────┼───────────────┤
│ normalization (Normalization)   │ (None, 600, 600, 3)    │             7 │
├─────────────────────────────────┼────────────────────────┼───────────────┤
│ rescaling_1 (Rescaling)         │ (None, 600, 600, 3)    │             0 │
├─────────────────────────────────┼────────────────────────┼───────────────┤
│ stem_conv_pad (ZeroPadding2D)   │ (None, 601, 601, 3)    │             0 │
├─────────────────────────────────┼────────────────────────┼───────────────┤
│ stem_conv (Conv2D)              │ (None, 300, 300, 64)   │         1,728 │
└─────────────────────────────────┴────────────────────────┴───────────────┘
 Total params: 1,735 (6.78 KB)
 Trainable params: 1,728 (6.75 KB)
 Non-trainable params: 7 (32.00 B)

model.summary with 3.11.1:

Model: "efficientnet"
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┓
┃ Layer (type)                    ┃ Output Shape           ┃       Param # ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━┩
│ input_layer_1 (InputLayer)      │ (None, 600, 600, 3)    │             0 │
├─────────────────────────────────┼────────────────────────┼───────────────┤
│ rescaling_2 (Rescaling)         │ (None, 600, 600, 3)    │             0 │
├─────────────────────────────────┼────────────────────────┼───────────────┤
│ normalization_1 (Normalization) │ (None, 600, 600, 3)    │             7 │
├─────────────────────────────────┼────────────────────────┼───────────────┤
│ rescaling_3 (Rescaling)         │ (None, 600, 600, 1)    │             0 │
├─────────────────────────────────┼────────────────────────┼───────────────┤
│ stem_conv_pad (ZeroPadding2D)   │ (None, 601, 601, 1)    │             0 │
├─────────────────────────────────┼────────────────────────┼───────────────┤
│ stem_conv (Conv2D)              │ (None, 300, 300, 64)   │           576 │
└─────────────────────────────────┴────────────────────────┴───────────────┘
 Total params: 583 (2.28 KB)
 Trainable params: 576 (2.25 KB)
 Non-trainable params: 7 (28.00 B)

Comment From: saurabhkumar8112

Same issue with EfficientNetB3

Comment From: saurabhkumar8112

I solved it temporarily by loading the backbone first and then loading the weight. I am not sure what is the cause but this is a temporary fix

def build_model(img_size: int, num_classes=2) -> tf.keras.Model:

    inputs = tf.keras.Input(shape=(img_size, img_size, 3))
    x = tf.keras.Sequential([
        layers.RandomFlip("horizontal"),
    ], name="augmentation")(inputs)

    x = layers.Lambda(
        lambda t: tf.keras.applications.efficientnet.preprocess_input(t),
        name="effnet_preprocess")(x)


    base = tf.keras.applications.EfficientNetB3(
        include_top=False,
        weights=None,          
        input_tensor=x,        
        pooling="avg",
    )

    # Manually load official notop weights
    weights_url = "https://storage.googleapis.com/keras-applications/efficientnetb3_notop.h5"
    weights_path = tf.keras.utils.get_file("efficientnetb3_notop.h5", origin=weights_url, cache_subdir="models")
    base.load_weights(weights_path)

    return base

Comment From: sonali-kumari1

Hi @abheesht17 - This issue does stem from my recent change in compute_output_shape(...) in the Rescaling layer. Prior to this, it returned input_shape directly. I updated the compute_output_shape(...) to infer output shape based on the length of scale and offset.

While reverting to the original behavior avoids shape mismatches, it leads to inconsistencies between model.output_shape, model.summary() and y_pred.shape. The related issue can be found here.

To resolve this, should we consider improving the shape inference logic to more accurately reflect broadcasting, or instead clearly document and enforce restrictions on the shapes of scale and offset to avoid ambiguity?