I opened this on tensorflow repo, and was told to move it here: https://github.com/tensorflow/tensorflow/issues/74475
The short of it, gru, at least on google colab(keras 3.4.1) returns wrong things when run with gpu available.
Minimal way to reproduce here:
import tensorflow as tf
class TestModel(tf.keras.Model):
def __init__(self):
super().__init__()
self.gru = tf.keras.layers.GRU(10, return_sequences=True, return_state=True)
def call(self, inputs):
return self.gru(inputs)
# Create and test the model
model = TestModel()
test_input = tf.random.uniform((2, 3, 5)) # Batch size = 2, sequence length = 3, feature size = 5
output = model(test_input)
print("Output types and shapes:", [(type(o), o.shape) for o in output])
This prints
With GPU:
Output types and shapes: [(<class 'tensorflow.python.framework.ops.EagerTensor'>, TensorShape([2, 3, 10])), (<class 'tensorflow.python.framework.ops.EagerTensor'>, TensorShape([10])), (<class 'tensorflow.python.framework.ops.EagerTensor'>, TensorShape([10]))]
With CPU:
Output types and shapes: [(<class 'tensorflow.python.framework.ops.EagerTensor'>, TensorShape([2, 3, 10])), (<class 'tensorflow.python.framework.ops.EagerTensor'>, TensorShape([2, 10]))]
CPU behavior seems correct.
**Edited to add, I do not have the ability to test gpu behavior outside of google colab, so this might be a bug that's been fixed on the latest version, or due to colab-specific misconfiguration.
Comment From: sachinprasadhs
I was able to reproduce the reported behavior, attaching the Gist here
with the Torch backend, it's producing the expected outcome as below
Output types and shapes: [(<class 'torch.Tensor'>, torch.Size([2, 3, 10])), (<class 'torch.Tensor'>, torch.Size([2, 10]))]
Comment From: mattdangerw
Probably and issue with the cudnn
specific implementation on the tf backend, which is pretty dense. I will take a look.
Comment From: AdityaMayukhSom
Similar issue happening in case of running Keras with Tensorflow backend on desktop. Hidden states of the individual element in a batch are returned as a tuple of the GRU output and not as a Tensor with first dimension equal to batch size.
Comment From: FedericoGriggio
Hi everyone,
I ran into a similar issue where GRU with return_sequences=True and return_state=True behaved differently on GPU (CuDNN) vs. CPU, throwing a ValueError due to mismatched output unpacking.
Workaround: Adding reset_after=False resolved it for me:
tf.keras.layers.GRU(
128,
return_sequences=True,
return_state=True,
reset_after=False # Disables CuDNN-specific behavior
)
This ensures consistent (output, state) returns across devices.
Thanks to the Keras team for investigating this (hope this helps others!)
Comment From: fhchl
I also just hit this issue when converting a codebase from TF2.15 keras to Keras 3 where the output changes according to the reset_after
argument.
However, it seems that the GPU behavior (return a list of tensors for output and states for each example) is the one that aligns with the expected output of the RNN
superclass.
So there also seems to be some inconsistency in the docs.
Is there a workaround that works for GPU and CPU and that allows CuDNN use on GPU?
Some failed attempts at a workaround (see this notebook):
- Collecting the states using
x, *states = ...
but that fails due to Graph mode not supporting Iterating over a symbolictf.Tensor
. - Collecting the state as
output = gru(...)
and then splitting it with indexing andkeras.ops.stack
, but that just pushes the same error to a line inRNN.call
.