TensorFlow2.x の Subclassing API で Summary 表示されない問題

2020年12月8日 09:46

この記事の内容

このエラーの（とりあえずの）解決策の紹介と、エラーの意味の考察。

ValueError: This model has not yet been built. Build the model first by calling `build()` or calling `fit()` with some data, or specify an `input_shape` argument in the first layer(s) for automatic build.

この記事の前提

・Pythonの基礎を知っている
・KerasとTensorFlow2.x系をちょっと知ってる
・KerasのAPIが3つあることを知っている
　（Sequential API, Functional API, Subclassing API）

※これらの前提について自信がなかったらこの辺りの内容を理解してからこの記事を読むとスムーズに理解できると思います。

準備

これらをインポートしておく。

import tensorflow as tf
from tensorflow.keras import Input, Model
from tensorflow.keras.layers import Dense, Layer

# ここでは便宜上、モデルの入力の形を定数にしておく
INPUT_SHAPE = (28,28)

Summaryとは

その名の通りモデルの要約を出力してくれる機能。

コード（Functional API）：

inputs = Input(shape=INPUT_SHAPE)
x = Dense(units=30, activation="sigmoid", name="dense0")(inputs)
x = Dense(units=20, activation="sigmoid", name="dense1")(x)
outputs = Dense(units=10, activation="sigmoid", name="dense2")(x)
func_model = Model(inputs=inputs, outputs=outputs)
func_model.summary()

出力：

Model: "model"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
input_1 (InputLayer)         [(None, 28, 28)]          0         
_________________________________________________________________
dense0 (Dense)               (None, 28, 30)            870       
_________________________________________________________________
dense1 (Dense)               (None, 28, 20)            620       
_________________________________________________________________
dense2 (Dense)               (None, 28, 10)            210       
=================================================================
Total params: 1,700
Trainable params: 1,700
Non-trainable params: 0
_________________________________________________________________

こんな感じにネットワークの層の情報を可視化してくれる。

Subclassing APIで同じモデルを実装する

こんな感じにtensorflow.keras.Modelクラスを継承して書く。

class MyNet(Model):
   def __init__(self):
       super().__init__()
       self.dense_0 = Dense(units=30, activation="sigmoid", name="dense0")
       self.dense_1 = Dense(units=20, activation="sigmoid", name="dense1")
       self.dense_2 = Dense(units=10, activation="sigmoid", name="dense2")
   
   def call(self, inputs):
       x = self.dense_0(inputs)
       x = self.dense_1(x)
       outputs = self.dense_2(x)
       return outputs

subclassing_model_A = MyNet()

ここでSummary問題が発生

以下のコードを実行するとエラーになります。

subclassing_model_A.summary()

ValueError: This model has not yet been built. Build the model first by calling `build()` or calling `fit()` with some data, or specify an `input_shape` argument in the first layer(s) for automatic build.

解決策その１

エラー文を読んでみると、

「build()」か「fit()」を呼んでモデルを構築するか、入力層に「input_shape」を与えて自動的にモデルを構築してください。

といったことが書かれている。

実際に言われた通りやってみる。

subclassing_model_A(tf.zeros(INPUT_SHAPE))
subclassing_model_A.summary()

Model: "my_net"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
dense0 (Dense)               multiple                  870       
_________________________________________________________________
dense1 (Dense)               multiple                  620       
_________________________________________________________________
dense2 (Dense)               multiple                  210       
=================================================================
Total params: 1,700
Trainable params: 1,700
Non-trainable params: 0
_________________________________________________________________

出力された。

でもOutput ShapeがmultipleになっていてFunctional APIのときより情報量が少ない。

解決策その２

そこで、以下のようにmy_summaryメソッドをMyNetクラスに定義する。

class MyNet(Model):
   def __init__(self):
       super().__init__()
       self.dense_0 = Dense(units=30, activation="sigmoid", name="dense0")
       self.dense_1 = Dense(units=20, activation="sigmoid", name="dense1")
       self.dense_2 = Dense(units=10, activation="sigmoid", name="dense2")
   
   def call(self, inputs):
       x = self.dense_0(inputs)
       x = self.dense_1(x)
       outputs = self.dense_2(x)
       return outputs
   
   def my_summary(self, input_shape):
       tmp_x = Input(shape=input_shape, name='tmp_input')
       tmp_m = Model(inputs=tmp_x, outputs=self.call(tmp_x), name='tmp_model')
       tmp_m.summary()
       del tmp_x, tmp_m

my_summaryメソッドにINPUT_SHAPEを渡して実行してみる。

subclassing_model_B = MyNet()
subclassing_model_B.my_summary(INPUT_SHAPE)

Model: "tmp_model"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
tmp_input (InputLayer)       [(None, 28, 28)]          0         
_________________________________________________________________
dense0 (Dense)               (None, 28, 30)            870       
_________________________________________________________________
dense1 (Dense)               (None, 28, 20)            620       
_________________________________________________________________
dense2 (Dense)               (None, 28, 10)            210       
=================================================================
Total params: 1,700
Trainable params: 1,700
Non-trainable params: 0
_________________________________________________________________

Output Shapeまできちんと表示された。

一体何をしたのか？？

定義したcallメソッドの入出力を用いて仮のFunctional APIのモデルを作成して、そのモデルのsummaryメソッドを呼ぶことで表示している。そのため、解決策その１では名前がModel: "my_net"となっているのに対し、解決策その２ではModel: "tmp_model"となっていて、全く同じ構造の別のモデルが定義されていることが分かる。

この方法の注意点その１

Subclassing APIで再帰的に層を構成する場合に適用されない。

これはどういうことなのか。

まず、DNNでは、いくつかの層をまとめてブロックにして、そのブロックをつなげて全体のネットワークを構成する場合がある。ここではMyBlockクラスでブロックを、MyBlockNetクラスで全体のモデルを構築してみた。

このような場合、解決策その２のmy_summaryメソッドを用いても、MyBlockクラスの層はまとめて一つの層として表示され、その内部構造は表示されない。

実際に構築して実行結果を見てみる。

class MyBlock(Layer):
   def __init__(self):
       super().__init__()
       self.dense_0 = Dense(units=300, activation="sigmoid", name="dense0")
       self.dense_1 = Dense(units=200, activation="sigmoid", name="dense1")
       self.dense_2 = Dense(units=100, activation="sigmoid", name="dense2")
   
   def call(self, inputs):
       x = self.dense_0(inputs)
       x = self.dense_1(x)
       outputs = self.dense_2(x)
       return outputs


class MyBlockNet(Model):
   def __init__(self):
       super().__init__()
       self.dense_0 = Dense(units=20, activation="sigmoid")
       self.block_0 = MyBlock()
       self.block_1 = MyBlock()
       self.dense_1 = Dense(units=10, activation="sigmoid")
   
   def call(self, inputs):
       x = self.dense_0(inputs)
       x = self.block_0(x)
       x = self.block_1(x)
       outputs = self.dense_1(x)
       return outputs
   
   def my_summary(self, input_shape):
       tmp_x = Input(shape=input_shape, name='tmp_input')
       tmp_m = Model(inputs=[tmp_x], outputs=self.call(tmp_x), name='tmp_model')
       tmp_m.summary()
       del tmp_x, tmp_m


my_block_net = MyBlockNet()
my_block_net.my_summary(INPUT_SHAPE)

Model: "tmp_model"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
tmp_input (InputLayer)       [(None, 28, 28)]          0         
_________________________________________________________________
dense (Dense)                (None, 28, 20)            580       
_________________________________________________________________
my_block (MyBlock)           (None, 28, 100)           86600     
_________________________________________________________________
my_block_1 (MyBlock)         (None, 28, 100)           110600    
_________________________________________________________________
dense_1 (Dense)              (None, 28, 10)            1010      
=================================================================
Total params: 198,790
Trainable params: 198,790
Non-trainable params: 0
_________________________________________________________________

このように、ブロックで定義した層は「my_block (MyBlock)」「my_block_1 (MyBlock)」と1行で出力され、MyBlockの中身の部分が出力されない。これは場合によっては便利だが、出力したい場合は解決策その２でも解決できない。

ちなみにFunctional APIで実装した場合を見てみると、

def my_block(inputs):
   x = Dense(units=300, activation="sigmoid")(inputs)
   x = Dense(units=200, activation="sigmoid")(x)
   outputs = Dense(units=100, activation="sigmoid")(x)
   return outputs

inputs = Input(shape=INPUT_SHAPE)
x = Dense(units=20, activation="sigmoid")(inputs)
x = my_block(x)
x = my_block(x)
outputs = Dense(units=10, activation="sigmoid")(x)

Model(inputs=inputs, outputs=outputs, name="f_block_net").summary()

Model: "f_block_net"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
input_2 (InputLayer)         [(None, 28, 28)]          0         
_________________________________________________________________
dense (Dense)                (None, 28, 20)            580       
_________________________________________________________________
dense_1 (Dense)              (None, 28, 300)           6300      
_________________________________________________________________
dense_2 (Dense)              (None, 28, 200)           60200     
_________________________________________________________________
dense_3 (Dense)              (None, 28, 100)           20100     
_________________________________________________________________
dense_4 (Dense)              (None, 28, 300)           30300     
_________________________________________________________________
dense_5 (Dense)              (None, 28, 200)           60200     
_________________________________________________________________
dense_6 (Dense)              (None, 28, 100)           20100     
_________________________________________________________________
dense_7 (Dense)              (None, 28, 10)            1010      
=================================================================
Total params: 198,790
Trainable params: 198,790
Non-trainable params: 0
_________________________________________________________________

このように、ブロックの内部の層まで再帰的に表示されているのが分かる。

この方法の注意点その２

Define by Run形式でしか表現できない複雑な構造のモデルでは、正しい出力が得られない。

以下の例のように、層構造が動的に変化するようなモデルがあったとする。

class MyResNet(Model):
   def __init__(self):
       super().__init__()
       self.dense_0 = Dense(units=30, activation="sigmoid", name="dense0")
       self.dense_1 = Dense(units=20, activation="sigmoid", name="dense1")
       self.dense_2 = Dense(units=10, activation="sigmoid", name="dense2")
   
   def call(self, inputs):
       x = self.dense_0(inputs)
       # 50%の確率で中間層に接続される
       if tf.random.uniform((1,)) > 0.5:
           x = self.dense_1(x)
       outputs = self.dense_2(x)
       return outputs
   
   def my_summary(self, input_shape):
       tmp_x = Input(shape=input_shape, name='tmp_input')
       tmp_m = Model(inputs=tmp_x, outputs=self.call(tmp_x), name='tmp_model')
       tmp_m.summary()
       del tmp_x, tmp_m


MyResNet().my_summary(INPUT_SHAPE)

この場合、以下の2通りの出力のいずれかが表示され、コードを見なければ動的に変化していることは分からない。

Model: "tmp_model"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
tmp_input (InputLayer)       [(None, 28, 28)]          0         
_________________________________________________________________
dense0 (Dense)               (None, 28, 30)            870       
_________________________________________________________________
dense2 (Dense)               (None, 28, 10)            310       
=================================================================
Total params: 1,180
Trainable params: 1,180
Non-trainable params: 0
_________________________________________________________________

Model: "tmp_model"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
tmp_input (InputLayer)       [(None, 28, 28)]          0         
_________________________________________________________________
dense0 (Dense)               (None, 28, 30)            870       
_________________________________________________________________
dense1 (Dense)               (None, 28, 20)            620       
_________________________________________________________________
dense2 (Dense)               (None, 28, 10)            210       
=================================================================
Total params: 1,700
Trainable params: 1,700
Non-trainable params: 0
_________________________________________________________________

最後に

はい、ここまで長々とsummaryメソッドの仕様と解決策について話してきましたが、一旦お疲れさまでしたｗ

ここからは、このような事態になった背景について考察していこうと思います。

summaryメソッドが一定の条件でうまく使えなくなった仕様は、Subclassing APIになってモデルの自由度が格段に上がったことによる副作用だと考えています。

そもそもTensorFlow2.x系になるまではDefine and Runで、層構造は静的に定義されていたのでSummaryを表示できていました。しかし、Ver2.0からChainerやPyTorch同様のDefine by Runによる層構造の定義が可能になったため、summaryが表示できなくなった、ということでしょう。

デフォルトのモードが動的計算グラフになったのでかなり直感的に使いやすくなって個人的には超推してるんですが、静的だったからこそ得られる恩恵というものもあったんだなぁ…と感じています。

ちなみに、解決策その２で何とかなるのは、層構造を静的に定義することができるモデルの範囲内での話です。動的定義でないとうまく表現できないモデルに関してはこの方法ではSummaryを正しく表示させられないことがあります。

え？静的定義しかできないんだったらSubclassing API使わなくてもよくない？

そうなんです。Summary使いたいんならFunctional APIのほうが楽かもしれません。

とはいえ、TensorFlow2.0から徐々にTensorFlowも進化していて、例えばVer2.2では、kerasのfitメソッドの中身をカスタマイズできるようになりました。

これはSubclassing APIでModelを継承したクラスのtrain_stepメソッドをオーバーライドすることで可能になるので、そういった場合には役に立つのではないでしょうか！

最後になりますが、fitのカスタマイズのページに書かれているこの文章が最高に感動したので引用しておきます。

A core principle of Keras is progressive disclosure of complexity. You should always be able to get into lower-level workflows in a gradual way. You shouldn't fall off a cliff if the high-level functionality doesn't exactly match your use case. You should be able to gain more control over the small details while retaining a commensurate amount of high-level convenience.

最後まで読んでくださりありがとうございました。