I tried with tensorflow versions 2.19 and 2.18. The code that reproduces the error is:
import tensorflow as tf print(tf.__version__)
model = tf.keras.applications.EfficientNetB7( weights='imagenet', include_top=False, input_shape=(600, 600, 3) ) model.summary()
It raises "ValueError: Shape mismatch in layer #1 (named stem_conv)for weight stem_conv/kernel. Weight expects shape (3, 3, 1, 64). Received saved weight with shape (3, 3, 3, 64)". I tried clean the cache folder to get a fresher download, but the new download also raises the same error. Using include_top=True without input_shape parameter, also raises the same problem. Other model, e.g., EfficientNetB6, raises the same error.
Comment From: dhantule
Hi @h3dema, thanks for reporting this.
I've tested your code with the latest Keras 3.11.0 and Im facing the same error
ValueError: Shape mismatch in layer #1 (named stem_conv)for weight stem_conv/kernel. Weight expects shape (3, 3, 1, 64). Received saved weight with shape (3, 3, 3, 64)
However with Keras 3.8.0 everything seems to be working fine, please refer this attached gist. We'll look into this and update you.
Comment From: shubham-ojha-weheal
I have the same issue but with the B0 model.
It says
ValueError: Shape mismatch in layer #1 (named stem_conv)for weight stem_conv/kernel. Weight expects shape (3, 3, 1, 32). Received saved weight with shape (3, 3, 3, 32)
I installed TensorFlow version 2.18.1. It installed Keras version 3.11.1
Downgrading the Keras version to 3.8.0 solved the issue for me.
Comment From: k-r-4-m
I had the same issue with B7 and downgraded to keras 3.10.0, works fine now.
Comment From: abheesht17
Did some basic debugging. Clearly, the issue is with the Rescaling layer, because it outputs only 1 channel (instead of 3) with Keras 3.11.1. @sonali-kumari1 - it seems like this PR changed compute_output_shape(...), which results in the error. Could you please take a look? Thanks!
model.summary() with 3.10:
Model: "efficientnet"
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┓
┃ Layer (type) ┃ Output Shape ┃ Param # ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━┩
│ input_layer (InputLayer) │ (None, 600, 600, 3) │ 0 │
├─────────────────────────────────┼────────────────────────┼───────────────┤
│ rescaling (Rescaling) │ (None, 600, 600, 3) │ 0 │
├─────────────────────────────────┼────────────────────────┼───────────────┤
│ normalization (Normalization) │ (None, 600, 600, 3) │ 7 │
├─────────────────────────────────┼────────────────────────┼───────────────┤
│ rescaling_1 (Rescaling) │ (None, 600, 600, 3) │ 0 │
├─────────────────────────────────┼────────────────────────┼───────────────┤
│ stem_conv_pad (ZeroPadding2D) │ (None, 601, 601, 3) │ 0 │
├─────────────────────────────────┼────────────────────────┼───────────────┤
│ stem_conv (Conv2D) │ (None, 300, 300, 64) │ 1,728 │
└─────────────────────────────────┴────────────────────────┴───────────────┘
Total params: 1,735 (6.78 KB)
Trainable params: 1,728 (6.75 KB)
Non-trainable params: 7 (32.00 B)
model.summary with 3.11.1:
Model: "efficientnet"
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┓
┃ Layer (type) ┃ Output Shape ┃ Param # ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━┩
│ input_layer_1 (InputLayer) │ (None, 600, 600, 3) │ 0 │
├─────────────────────────────────┼────────────────────────┼───────────────┤
│ rescaling_2 (Rescaling) │ (None, 600, 600, 3) │ 0 │
├─────────────────────────────────┼────────────────────────┼───────────────┤
│ normalization_1 (Normalization) │ (None, 600, 600, 3) │ 7 │
├─────────────────────────────────┼────────────────────────┼───────────────┤
│ rescaling_3 (Rescaling) │ (None, 600, 600, 1) │ 0 │
├─────────────────────────────────┼────────────────────────┼───────────────┤
│ stem_conv_pad (ZeroPadding2D) │ (None, 601, 601, 1) │ 0 │
├─────────────────────────────────┼────────────────────────┼───────────────┤
│ stem_conv (Conv2D) │ (None, 300, 300, 64) │ 576 │
└─────────────────────────────────┴────────────────────────┴───────────────┘
Total params: 583 (2.28 KB)
Trainable params: 576 (2.25 KB)
Non-trainable params: 7 (28.00 B)
Comment From: saurabhkumar8112
Same issue with EfficientNetB3
Comment From: saurabhkumar8112
I solved it temporarily by loading the backbone first and then loading the weight. I am not sure what is the cause but this is a temporary fix
def build_model(img_size: int, num_classes=2) -> tf.keras.Model:
inputs = tf.keras.Input(shape=(img_size, img_size, 3))
x = tf.keras.Sequential([
layers.RandomFlip("horizontal"),
], name="augmentation")(inputs)
x = layers.Lambda(
lambda t: tf.keras.applications.efficientnet.preprocess_input(t),
name="effnet_preprocess")(x)
base = tf.keras.applications.EfficientNetB3(
include_top=False,
weights=None,
input_tensor=x,
pooling="avg",
)
# Manually load official notop weights
weights_url = "https://storage.googleapis.com/keras-applications/efficientnetb3_notop.h5"
weights_path = tf.keras.utils.get_file("efficientnetb3_notop.h5", origin=weights_url, cache_subdir="models")
base.load_weights(weights_path)
return base
Comment From: sonali-kumari1
Hi @abheesht17 -
This issue does stem from my recent change in compute_output_shape(...) in the Rescaling layer. Prior to this, it returned input_shape directly. I updated the compute_output_shape(...) to infer output shape based on the length of scale and offset.
While reverting to the original behavior avoids shape mismatches, it leads to inconsistencies between model.output_shape, model.summary() and y_pred.shape. The related issue can be found here.
To resolve this, should we consider improving the shape inference logic to more accurately reflect broadcasting, or instead clearly document and enforce restrictions on the shapes of scale and offset to avoid ambiguity?