I opened this on tensorflow repo, and was told to move it here: https://github.com/tensorflow/tensorflow/issues/74475

The short of it, gru, at least on google colab(keras 3.4.1) returns wrong things when run with gpu available.

Minimal way to reproduce here:

import tensorflow as tf

class TestModel(tf.keras.Model):
    def __init__(self):
        super().__init__()
        self.gru = tf.keras.layers.GRU(10, return_sequences=True, return_state=True)

    def call(self, inputs):
        return self.gru(inputs)

# Create and test the model
model = TestModel()
test_input = tf.random.uniform((2, 3, 5))  # Batch size = 2, sequence length = 3, feature size = 5
output = model(test_input)
print("Output types and shapes:", [(type(o), o.shape) for o in output])

This prints

With GPU:

Output types and shapes: [(<class 'tensorflow.python.framework.ops.EagerTensor'>, TensorShape([2, 3, 10])), (<class 'tensorflow.python.framework.ops.EagerTensor'>, TensorShape([10])), (<class 'tensorflow.python.framework.ops.EagerTensor'>, TensorShape([10]))]

With CPU:

Output types and shapes: [(<class 'tensorflow.python.framework.ops.EagerTensor'>, TensorShape([2, 3, 10])), (<class 'tensorflow.python.framework.ops.EagerTensor'>, TensorShape([2, 10]))]

CPU behavior seems correct.

**Edited to add, I do not have the ability to test gpu behavior outside of google colab, so this might be a bug that's been fixed on the latest version, or due to colab-specific misconfiguration.

Comment From: sachinprasadhs

I was able to reproduce the reported behavior, attaching the Gist here

with the Torch backend, it's producing the expected outcome as below Output types and shapes: [(<class 'torch.Tensor'>, torch.Size([2, 3, 10])), (<class 'torch.Tensor'>, torch.Size([2, 10]))]

Comment From: mattdangerw

Probably and issue with the cudnn specific implementation on the tf backend, which is pretty dense. I will take a look.

Comment From: AdityaMayukhSom

Similar issue happening in case of running Keras with Tensorflow backend on desktop. Hidden states of the individual element in a batch are returned as a tuple of the GRU output and not as a Tensor with first dimension equal to batch size.

Comment From: FedericoGriggio

Hi everyone,

I ran into a similar issue where GRU with return_sequences=True and return_state=True behaved differently on GPU (CuDNN) vs. CPU, throwing a ValueError due to mismatched output unpacking.

Workaround: Adding reset_after=False resolved it for me:

tf.keras.layers.GRU(
    128, 
    return_sequences=True, 
    return_state=True, 
    reset_after=False  # Disables CuDNN-specific behavior
)

This ensures consistent (output, state) returns across devices.

Thanks to the Keras team for investigating this (hope this helps others!)

Comment From: fhchl

I also just hit this issue when converting a codebase from TF2.15 keras to Keras 3 where the output changes according to the reset_after argument.

However, it seems that the GPU behavior (return a list of tensors for output and states for each example) is the one that aligns with the expected output of the RNN superclass.

So there also seems to be some inconsistency in the docs.

Is there a workaround that works for GPU and CPU and that allows CuDNN use on GPU?


Some failed attempts at a workaround (see this notebook):

  • Collecting the states using x, *states = ... but that fails due to Graph mode not supporting Iterating over a symbolic tf.Tensor.
  • Collecting the state as output = gru(...) and then splitting it with indexing and keras.ops.stack, but that just pushes the same error to a line in RNN.call.