Problem
When passing a GRU
/LSTM
instance with use_cudnn
set to True
/False
(not default "auto"
) to Bidirectional
layer, it's expected the nested forward_layer
and backward_layer
will retain the value of this attribute. However, it gets reset to default "auto"
.
Using keras==3.10.0
, tensorflow==2.19
, onnx==1.17.0
, onnxruntime==1.22.0
.
Motivation
RNN layers such as GRU or LSTM have argument/attribute use_cudnn: bool | Literal["auto"]
. Passing False
is useful e.g. when we want to train a RNN on an NVIDIA GPU with acceleration through CuDNN and export to ONNX. ONNX doesn't support the CudnnRNNV3
operation and the export fails. We can instead make a new instance of the model with use_cudnn=False
passed to the RNN layers, copy the weights from the GPU model and export successfully. But since the Bidirectional
layers reset the value, the export fails again if the model contains Bidirectional
.
Cause
The use_cudnn
attribute of the RNN layers is not serialized in get_config()
, which seems to be intentional and rather reasonable. But Bidirectional
uses this (de)serialization to make a clean copy of the input RNN layers. In this step the use_cudnn
value is lost.
Example
import keras
import numpy as np
def create_model(use_cudnn="auto"):
input = keras.layers.Input((10, 2))
gru = keras.layers.GRU(1, use_cudnn=use_cudnn)
print("gru.use_cudnn", gru.use_cudnn)
bidi = keras.layers.Bidirectional(gru)
print("bidi use_cudnn:", bidi.forward_layer.use_cudnn, bidi.backward_layer.use_cudnn)
output = bidi(input)
model = keras.models.Model(input, output)
return model
model_gpu = create_model()
model_gpu(np.ones((2, 10, 2)))
model_gpu.export("gru.onnx", format="onnx")
#Cannot infer shape for functional_11_1/bidirectional_1/forward_gru_12_1/CudnnRNNV3: functional_11_1/bidirectional_1/forward_gru_12_1/CudnnRNNV3:3,functional_11_1/bidirectional_1/forward_gru_12_1/CudnnRNNV3:4
#Cannot infer shape for functional_11_1/bidirectional_1/backward_gru_12_1/CudnnRNNV3: functional_11_1/bidirectional_1/backward_gru_12_1/CudnnRNNV3:3,functional_11_1/bidirectional_1/backward_gru_12_1/CudnnRNNV3:4
#Tensorflow op [functional_11_1/bidirectional_1/forward_gru_12_1/CudnnRNNV3: CudnnRNNV3] is not supported
#Tensorflow op [functional_11_1/bidirectional_1/backward_gru_12_1/CudnnRNNV3: CudnnRNNV3] is not supported
#Unsupported ops: Counter({'CudnnRNNV3': 2})
model_cpu = create_model(use_cudnn=True)
model_cpu(np.ones((2, 10, 2)))
# gru.use_cudnn True
# bidi use_cudnn: auto auto # !
# export fails
model_cpu = create_model(use_cudnn=False)
model_cpu.set_weights(model_gpu.get_weights())
print(model_cpu(np.ones((2, 10, 2))))
model_cpu.export("gru_cpu.onnx", format="onnx")
# gru.use_cudnn False
# bidi use_cudnn: auto auto
# export fails
Workaround
So far it's sufficient to just set the use_cudnn
attribute of the nested layers explicitly after creating the Bidirectional
instance.
Example wrapper:
def bidirectional(layer, *args, **kwargs) -> keras.layers.Bidirectional:
"""A workaround for Bidirectional to preserve the `use_cudnn` attribute of the nested layers."""
bidi = keras.layers.Bidirectional(layer, *args, **kwargs)
if hasattr(layer, "use_cudnn"):
bidi.forward_layer.use_cudnn = bidi.backward_layer.use_cudnn = layer.use_cudnn
return bidi
# using within create_model() function from above:
# export successful
Proposed solution
Set these attributes the same way in the function above directly in Bidirectional.__init__()
, just before self._verify_layer_config()
.
The described fix has been implemented in: https://github.com/keras-team/keras/pull/21534
References
- It seems that the same problem was encountered in https://github.com/keras-team/keras/issues/20588, but the issue was closed.
- https://github.com/keras-team/keras/issues/8860
- https://github.com/onnx/tensorflow-onnx/issues/2359
- https://github.com/keras-team/keras/pull/8908 - previous problem with CuDNN RNNs and Bidirectional