Problem

When passing a GRU/LSTM instance with use_cudnn set to True/False (not default "auto") to Bidirectional layer, it's expected the nested forward_layer and backward_layer will retain the value of this attribute. However, it gets reset to default "auto".

Using keras==3.10.0, tensorflow==2.19, onnx==1.17.0, onnxruntime==1.22.0.

Motivation

RNN layers such as GRU or LSTM have argument/attribute use_cudnn: bool | Literal["auto"]. Passing False is useful e.g. when we want to train a RNN on an NVIDIA GPU with acceleration through CuDNN and export to ONNX. ONNX doesn't support the CudnnRNNV3 operation and the export fails. We can instead make a new instance of the model with use_cudnn=False passed to the RNN layers, copy the weights from the GPU model and export successfully. But since the Bidirectional layers reset the value, the export fails again if the model contains Bidirectional.

Cause

The use_cudnn attribute of the RNN layers is not serialized in get_config(), which seems to be intentional and rather reasonable. But Bidirectional uses this (de)serialization to make a clean copy of the input RNN layers. In this step the use_cudnn value is lost.

Example

import keras
import numpy as np

def create_model(use_cudnn="auto"):
    input = keras.layers.Input((10, 2))
    gru = keras.layers.GRU(1, use_cudnn=use_cudnn)
    print("gru.use_cudnn", gru.use_cudnn)
    bidi = keras.layers.Bidirectional(gru)
    print("bidi use_cudnn:", bidi.forward_layer.use_cudnn, bidi.backward_layer.use_cudnn)
    output = bidi(input)
    model = keras.models.Model(input, output)
    return model

model_gpu = create_model()
model_gpu(np.ones((2, 10, 2)))
model_gpu.export("gru.onnx", format="onnx")

#Cannot infer shape for functional_11_1/bidirectional_1/forward_gru_12_1/CudnnRNNV3: functional_11_1/bidirectional_1/forward_gru_12_1/CudnnRNNV3:3,functional_11_1/bidirectional_1/forward_gru_12_1/CudnnRNNV3:4
#Cannot infer shape for functional_11_1/bidirectional_1/backward_gru_12_1/CudnnRNNV3: functional_11_1/bidirectional_1/backward_gru_12_1/CudnnRNNV3:3,functional_11_1/bidirectional_1/backward_gru_12_1/CudnnRNNV3:4
#Tensorflow op [functional_11_1/bidirectional_1/forward_gru_12_1/CudnnRNNV3: CudnnRNNV3] is not supported
#Tensorflow op [functional_11_1/bidirectional_1/backward_gru_12_1/CudnnRNNV3: CudnnRNNV3] is not supported
#Unsupported ops: Counter({'CudnnRNNV3': 2})

model_cpu = create_model(use_cudnn=True)
model_cpu(np.ones((2, 10, 2)))
# gru.use_cudnn True
# bidi use_cudnn: auto auto # !
# export fails

model_cpu = create_model(use_cudnn=False)
model_cpu.set_weights(model_gpu.get_weights())
print(model_cpu(np.ones((2, 10, 2))))
model_cpu.export("gru_cpu.onnx", format="onnx")
# gru.use_cudnn False
# bidi use_cudnn: auto auto
# export fails

Workaround

So far it's sufficient to just set the use_cudnn attribute of the nested layers explicitly after creating the Bidirectional instance.

Example wrapper:

def bidirectional(layer, *args, **kwargs) -> keras.layers.Bidirectional:
    """A workaround for Bidirectional to preserve the `use_cudnn` attribute of the nested layers."""
    bidi = keras.layers.Bidirectional(layer, *args, **kwargs)
    if hasattr(layer, "use_cudnn"):
        bidi.forward_layer.use_cudnn = bidi.backward_layer.use_cudnn = layer.use_cudnn
    return bidi

# using within create_model() function from above:
# export successful

Proposed solution

Set these attributes the same way in the function above directly in Bidirectional.__init__(), just before self._verify_layer_config().

The described fix has been implemented in: https://github.com/keras-team/keras/pull/21534

References

  • It seems that the same problem was encountered in https://github.com/keras-team/keras/issues/20588, but the issue was closed.
  • https://github.com/keras-team/keras/issues/8860
  • https://github.com/onnx/tensorflow-onnx/issues/2359
  • https://github.com/keras-team/keras/pull/8908 - previous problem with CuDNN RNNs and Bidirectional