AIGC算法：GAN图像超分原理与实现

研究院院长 LV8

2023-05-31 · 58 阅读

1、简介

分辨率：图像分辨率是数字图像中图像像素的测量概念，通常表示为图像的水平像素与垂直像素的乘积。如一张1080p的图像大小为1920×1080，则水平方向存在1920个像素，垂直方向存在1080个像素。通常，分辨率越高，单位尺寸内所包含的像素数越多，图像包含的细节信息越丰富，感官上图像越细腻、越逼真。

图像超分算法：在现实情况下，受采集设备性能、网络传输带宽等因素限制，通常不能得到细节丰富、无成块模糊的高分辨率图像。提高图像分辨率最直接的方法是对光学硬件进行改进，而由于制造工艺难，大幅改进成本昂贵，物理上实现图像超分成本高、难度大。因此，从软件与算法上实现图像超分受到广泛关注。

图像超分目的：采用数字图像处理技术，将低分辨率（LR）图像升至高分辨率（HR）图像的过程。

2、图像超分应用

（1）智能显示领域：普通摄像头拍摄的图像分辨率一般偏低，不能满足高分辨率的视觉要求。目前4K 高清显示逐渐走向普及，但很多成像设备拍摄的图片以及老电影的像素分辨率远不及 4K。

（2）医学成像领域：医学仪器采集得到的图像分辨率通常偏低，高分辨率医学影像有利于发现微小的病灶。

（3）遥感成像领域：遥感成像时卫星或飞机与成像对象之间的距离较远，且受限于传感器技术以及成像设备成本等，采集的图片分辨率低，从而导致目标模糊不清，不利于对图像进行分析。

（4）城市视频监控领域：公共监控系统的摄像头受限于成本等因素往往是低分辨率的，而低分辨率的图片或视频不利于后续的人脸识别、车牌识别和目标结构化特征识别等任务。

（5）图像压缩传输领域：为了降低海量数据对传输带宽的压力，图像或视频数据在传输时会进行压缩处理，比如降低图像的分辨率。但是人们对这些图像或视频的清晰度要求是很高的，因此在接收端就需要使用图像超分辨率技术来提升分辨率，尽可能重建出原有的高清图像或视频。

3、SRGAN原理

（1）插值法图像超分

数字图像处理算法通过插值方法对图像超分，如常见的最邻近插值、双线性插值、双三次插值等。

图1. 图像插值

如图1，左图包含2×2像素，右图包含4×4像素，为通过图像插值得到。为得到超分图像，需知右图中每个位置下的像素值。

以最邻近算法为例，超分后图像中的像素值均来自超分前的图像，即新图像不产生像素值。其计算方法如下：

srcX=dstX*(srcWidth/dstWidth)

srcY=dstY*(srcHeight/dstHeight)

srcX，srcY为原图像的X坐标与Y坐标，dstX，dstY为目标图像的X坐标与Y坐标，srcWidth，srcHeight为原图像宽与高，dstWidth，dstHeight为目标图像的宽与高。

例如，右图目标图像（1,2）处的像素对应原图的srcX=1*(2/4)=0.5，四舍五入为1；srcY=2*(2/4)=1。即目标图像在（1,2）处的像素对应原图的（1,1）处的像素。图2为插值完成后的图像，可以发现右图中像素值并未发生变化。

图2. 最邻近法图像插值

最邻近图像插值法，虽能够提升图像分辨率，但新产生的图像仍为原图像像素值，没有增加任何数据与细节。该方法简单直接，时效性好，但是会破坏原有像素的渐变关系，表现为边缘处呈锯齿状结构（如图3所示）。

图3. 最邻近插值法图像超分

除此之外，双线性插值、双三次插值产生的超分图像中的像素值，是基于原图像中特定区域像素值的线性变换得到。通过线性插值得到的图像依然存在模糊，细节、纹理不足等问题。

（2）深度学习图像超分

图4. 深度学习图像超分

深度学习的图像超分旨在通过神经网络学习低清图像与高清图像之间的映射。不同于插值方法，使用简单的线性映射求超分图像像素值。深度学习方法，通过大量的低清图像与高清图像样本对来训练网络模型。训练完成后，超分图像中的像素值 � 由输入图像 � 以及非线性映射 � 求得。

SRCNN: 以CNN图像超分典型网络SRCNN结构为例，结构较为简单。网络仅包括输入层、3层卷积层、输出层。网络使用MSE作为损失函数来训练模型。

不足：网络使用MSE作为损失函数训练模型，重建后的图像与原高清图像的PSNR峰值信噪比较高，但是在视觉感官上表现为画面模型，纹理细节不足。

图5. SRCNN网络结构图

（3）SRGAN

SRGAN：GAN在图像超分领域的开山制作，它将深度网络与对抗网络相结合，生成高分辨率图像，SRGAN与SRCNN等方法相比，产生的图像拥有更加丰富的细节信息。

图6. 图像超分对比

如图6所示，最右侧为原始图像，左1为双三次插值得到；左2为SRResNet得到；左3为SRGAN得到。

图像超分对比：由图6发现，双三次插值法的到图像最模糊；SRResNet仅使用了生成网络且采用MSE作为损失函数，生成的图像画质轻微模糊，高频细节不够，如对图中手部“水花”的重建，效果模糊，未能够刻画出纹理信息。相比之下，SRGAN生成的图像纹理细节丰富，更为逼真。

SRGAN能够恢复出精细的纹理细节，生成逼真图像的关键在于两点：

· 采用了对抗损失adversarial loss训练生成器，即采用判别器来监督生成器，使其产生更逼真图像

· 采用了感知损失perceptual loss训练生成器，即采用VGG-19中特定层输出的MSE，度量高清图像与重建图像差异

什么是对抗损失函数？有什么作用？

上式表示最大化判别器对于生成器生成的图像，即判别器对生成器生成图像判别分数越高，生成器损失越小。换句话说，生成器生成的图像判别器越难进行分辨，那么这个图像质量就越高，上式值越小。

感知损失函数：

什么是感知损失？人们实验发现，两幅图像经VGG网络的某些中间层的输出越相似，两幅图像从感官上也越相似。因此，SRGAN将真实高清图像与生成超分图像，均输入VGG网络中提取特征，然后计算两图像特征的差值，判别相似度。上式表示，在VGG网络的 � 层输出，对真实高清图像与生成高清图像对应像素计算误差，感知损失函数的目的是缩小该误差，使两图像在感官上更相似。

生成器损失函数：

生成器总的损失函数为两者加权，一般感知损失函数权重选择：0.001。

网络结构：

图7. SRGAN网络结构

如图7所示，SRGAN网络包括生成网络与判别网络。

生成网络为卷积网络，主要由三部分构成：

（1）低分辨率图像（64×64×3）输入，经卷积层与LeakyReLU进入下一层；

（2）B个残差块，每个残差块包括两个卷积+标准化+ReLU；

（3）后接上采样层，经过两次上采样，宽高均变为原来4倍，即输出（256×256×3）

判别网络为普通卷积网络，输入为真实高清图像与生成高清图像，输出为图像概率值，接近1表示为真图片，接近0表示为假图片。

SRGAN的训练分为判别器的训练与生成器的训练。

在每个训练step，一般先训练判别器，再训练生成器。

判别器的训练：

（1）随机选取batch_size个真实高清图像；

（2）将高清图像进行resize得到低清图像，并传入Generator，生成batch_size个生成高清图像；

（3）真实高清图像label为1，生成高清图像label为0，传入判别器进行训练。

生成器的训练：

（1）将低清图像传入Generator得到生成高清图像，生成高清图像通过判别器输出得分D，计算损失-logD，从而得到对抗损失；

（2）将真实高清图像与生成高清图像，分别传入VGG网络提取特征，根据MSE计算感知损失；

（3）计算生成器损失，并根据反向传播，更新生成器网络参数。

4、SRGAN实现

下面通过Keras框架，搭建了简单的图像超分模型，通过自己准备的数据集，在CPU电脑上即可完成训练，并取得不错结果。如有兴趣，自行尝试。

importglob

importtime

importmatplotlib.pyplotasplt

importnumpyasnp

import tensorflow astf

from keras import Input

from keras.applications importVGG19

from keras.callbacks import TensorBoard

from keras.layers import BatchNormalization, Activation, LeakyReLU, Add, Dense

from keras.layers.convolutional import Conv2D, UpSampling2D

from keras.models import Model

from keras.optimizers import Adam

from scipy.misc import imread, imresize

importos
importsys
CURRENT_PATH =os.path.dirname(os.path.realpath(__file__))
sys.path.append(CURRENT_PATH)

defresidual_block(x):

"""
Residual block
"""

filters = [64, 64]
kernel_size =3
strides =1
padding ="same"
momentum =0.8
activation ="relu"

res = Conv2D(filters= filters[0], kernel_size= kernel_size, strides= strides, padding= padding)(x)
res = Activation(activation= activation)(res)
res = BatchNormalization(momentum= momentum)(res)

res = Conv2D(filters= filters[1], kernel_size= kernel_size, strides= strides, padding= padding)(res)
res = BatchNormalization(momentum= momentum)(res)

# Add res and x
res = Add()([res, x])
return res

defbuild_generator():

"""
Create a generator network using the hyperparameter values defined below
: return:
"""

residual_blocks =16
momentum =0.8
input_shape = (64, 64, 3)

# Input Layer of the generator network
input_layer = Input(shape= input_shape)

# Add the pre - residual block
gen1 = Conv2D(filters=64, kernel_size=9, strides=1, padding='same', activation='relu')(input_layer)

# Add 16 residual blocks
res = residual_block(gen1)
for i inrange(residual_blocks -1):
res = residual_block(res)

# Add the post - residual block
gen2 = Conv2D(filters=64, kernel_size=3, strides=1, padding='same')(res)
gen2 = BatchNormalization(momentum= momentum)(gen2)

# Take the sum of the output from the pre - residual block(gen1) and the post - residual block(gen2)
gen3 = Add()([gen2, gen1])

# Add an upsampling block
gen4 = UpSampling2D(size=2)(gen3)
gen4 = Conv2D(filters=256, kernel_size=3, strides=1, padding='same')(gen4)
gen4 = Activation('relu')(gen4)

# Add another upsampling block
gen5 = UpSampling2D(size=2)(gen4)
gen5 = Conv2D(filters=256, kernel_size=3, strides=1, padding='same')(gen5)
gen5 = Activation('relu')(gen5)

# Output convolution layer
gen6 = Conv2D(filters=3, kernel_size=9, strides=1, padding='same')(gen5)
output = Activation('tanh')(gen6)

# Keras model
model = Model(inputs= [input_layer], outputs= [output], name='generator')
return model

defbuild_discriminator():

"""
Create a discriminator network using the hyperparameter values defined below
: return:
"""

leakyrelu_alpha =0.2
momentum =0.8
input_shape = (256, 256, 3)

input_layer = Input(shape= input_shape)

# Add the first convolution block
dis1 = Conv2D(filters=64, kernel_size=3, strides=1, padding='same')(input_layer)
dis1 = LeakyReLU(alpha= leakyrelu_alpha)(dis1)

# Add the 2nd convolution block
dis2 = Conv2D(filters=64, kernel_size=3, strides=2, padding='same')(dis1)
dis2 = LeakyReLU(alpha= leakyrelu_alpha)(dis2)
dis2 = BatchNormalization(momentum= momentum)(dis2)

# Add the third convolution block
dis3 = Conv2D(filters=128, kernel_size=3, strides=1, padding='same')(dis2)
dis3 = LeakyReLU(alpha= leakyrelu_alpha)(dis3)
dis3 = BatchNormalization(momentum= momentum)(dis3)

# Add the fourth convolution block
dis4 = Conv2D(filters=128, kernel_size=3, strides=2, padding='same')(dis3)
dis4 = LeakyReLU(alpha= leakyrelu_alpha)(dis4)
dis4 = BatchNormalization(momentum=0.8)(dis4)

# Add the fifth convolution block
dis5 = Conv2D(256, kernel_size=3, strides=1, padding='same')(dis4)
dis5 = LeakyReLU(alpha= leakyrelu_alpha)(dis5)
dis5 = BatchNormalization(momentum= momentum)(dis5)

# Add the sixth convolution block
dis6 = Conv2D(filters=256, kernel_size=3, strides=2, padding='same')(dis5)
dis6 = LeakyReLU(alpha= leakyrelu_alpha)(dis6)
dis6 = BatchNormalization(momentum= momentum)(dis6)

# Add the seventh convolution block
dis7 = Conv2D(filters=512, kernel_size=3, strides=1, padding='same')(dis6)
dis7 = LeakyReLU(alpha= leakyrelu_alpha)(dis7)
dis7 = BatchNormalization(momentum= momentum)(dis7)

# Add the eight convolution block
dis8 = Conv2D(filters=512, kernel_size=3, strides=2, padding='same')(dis7)
dis8 = LeakyReLU(alpha= leakyrelu_alpha)(dis8)
dis8 = BatchNormalization(momentum= momentum)(dis8)

# Add a dense layer
dis9 = Dense(units=1024)(dis8)
dis9 = LeakyReLU(alpha=0.2)(dis9)

# Last dense layer - for classification
output = Dense(units=1, activation='sigmoid')(dis9)
model = Model(inputs= [input_layer], outputs= [output], name='discriminator')
return model

defbuild_vgg():
"""
Build VGG network to extract image features
"""
input_shape = (256, 256, 3)

# Load a pre - trained VGG19 model trained on 'Imagenet' dataset
vgg = VGG19(weights="imagenet")
vgg.outputs = [vgg.layers[9].output]

input_layer = Input(shape= input_shape)

# Extract features
features = vgg(input_layer)

# Create a Keras model
model = Model(inputs= [input_layer], outputs= [features])
return model

defsample_images(data_dir, batch_size, high_resolution_shape, low_resolution_shape):

# Make a list of all images inside the data directory
all_images =glob.glob(data_dir)

# Choose a random batch of images
images_batch =np.random.choice(all_images, size= batch_size)
low_resolution_images = []
high_resolution_images = []

for img in images_batch:

# Get an ndarray of the current image
img1 = imread(img, mode='RGB')
img1 = img1.astype(np.float32)

# Resize the image
img1_high_resolution = imresize(img1, high_resolution_shape)
img1_low_resolution = imresize(img1, low_resolution_shape)

# Do a random horizontal flip
ifnp.random.random() <0.5:
img1_high_resolution =np.fliplr(img1_high_resolution)
img1_low_resolution =np.fliplr(img1_low_resolution)

high_resolution_images.append(img1_high_resolution)
low_resolution_images.append(img1_low_resolution)

# Convert the lists to Numpy NDArrays
returnnp.array(high_resolution_images), np.array(low_resolution_images)

defsave_images(low_resolution_image, original_image, generated_image, path):

"""
Save low - resolution, high - resolution(original) and
generated high - resolution images in a single image
"""

fig =plt.figure()
ax = fig.add_subplot(1, 3, 1)
ax.imshow(low_resolution_image)
ax.axis("off")
ax.set_title("Low-resolution")

ax = fig.add_subplot(1, 3, 2)
ax.imshow(original_image)
ax.axis("off")
ax.set_title("Original")

ax = fig.add_subplot(1, 3, 3)
ax.imshow(generated_image)
ax.axis("off")
ax.set_title("Generated")

plt.savefig(path)

defwrite_log(callback, name, value, batch_no):

"""
Write scalars to Tensorboard
"""

summary =tf.Summary()

summary_value = summary.value.add()

summary_value.simple_value = value

summary_value.tag = name

callback.writer.add_summary(summary, batch_no)

callback.writer.flush()

if __name__ =='__main__':

data_dir =r"image_align_celeba\*.*"

epochs =200

batch_size =1

mode ='predict'

# Shape of low - resolution and high - resolution images
low_resolution_shape = (64, 64, 3)
high_resolution_shape = (256, 256, 3)

# Common optimizer for all networks
common_optimizer = Adam(0.0002, 0.5)

if mode =='train':

# Build and compile VGG19 network to extract features
vgg = build_vgg()
vgg.trainable =False
vgg.compile(loss='mse', optimizer= common_optimizer, metrics= ['accuracy'])

# Build and compile the discriminator network
discriminator = build_discriminator()
discriminator.compile(loss='mse', optimizer= common_optimizer, metrics= ['accuracy'])

# Build the generator network
generator = build_generator()

"""
Build and compile the adversarial model
"""
# Input layers for high - resolution and low - resolution images
input_high_resolution = Input(shape= high_resolution_shape)
input_low_resolution = Input(shape= low_resolution_shape)

# Generate high - resolution images from low - resolution images
generated_high_resolution_images = generator(input_low_resolution)

# Extract feature maps of the generated images
features = vgg(generated_high_resolution_images)

# Make the discriminator network as non - trainable
discriminator.trainable =False

# Get the probability of generated high - resolution images
probs = discriminator(generated_high_resolution_images)

# Create and compile an adversarial model
adversarial_model = Model([input_low_resolution, input_high_resolution], [probs, features])
adversarial_model.compile(loss= ['binary_crossentropy', 'mse'], loss_weights= [1e-3, 1], optimizer= common_optimizer)

# Add Tensorboard
tensorboard = TensorBoard(log_dir="logs/".format(time.time()))
tensorboard.set_model(generator)
tensorboard.set_model(discriminator)

for epoch inrange(epochs):
  print("Epoch:{}".format(epoch))

"""
Train the discriminator network
"""

# Sample a batch of images
high_resolution_images, low_resolution_images = sample_images(data_dir= data_dir, batch_size= batch_size,
low_resolution_shape= low_resolution_shape,
high_resolution_shape= high_resolution_shape)
# Normalize images
high_resolution_images = high_resolution_images /127.5-1.
low_resolution_images = low_resolution_images /127.5-1.

# Generate high - resolution images from low - resolution images
generated_high_resolution_images = generator.predict(low_resolution_images)

# Generate batch of real and fake labels
real_labels =np.ones((batch_size, 16, 16, 1))
fake_labels =np.zeros((batch_size, 16, 16, 1))

# Train the discriminator network on real and fake images
d_loss_real = discriminator.train_on_batch(high_resolution_images, real_labels)
d_loss_fake = discriminator.train_on_batch(generated_high_resolution_images, fake_labels)

# Calculate total discriminator loss
d_loss =0.5*np.add(d_loss_real, d_loss_fake)
print("d_loss:", d_loss)

"""
Train the generator network
"""
# Sample a batch of images
high_resolution_images, low_resolution_images = sample_images(data_dir= data_dir, batch_size= batch_size,
low_resolution_shape= low_resolution_shape,
high_resolution_shape= high_resolution_shape)

# Normalize images
high_resolution_images = high_resolution_images /127.5-1.
low_resolution_images = low_resolution_images /127.5-1.

# Extract feature maps for real high - resolution images
image_features = vgg.predict(high_resolution_images)

# Train the generator network
g_loss = adversarial_model.train_on_batch([low_resolution_images, high_resolution_images],

[real_labels, image_features])

print("g_loss:", g_loss)

# Write the losses to Tensorboard
write_log(tensorboard, 'g_loss', g_loss[0], epoch)

write_log(tensorboard, 'd_loss', d_loss[0], epoch)

# Sample and save images after every 100 epochs
if epoch %100==0:

high_resolution_images, low_resolution_images = sample_images(data_dir= data_dir, batch_size= batch_size,

      low_resolution_shape= low_resolution_shape,

      high_resolution_shape= high_resolution_shape)

# Normalize images
high_resolution_images = high_resolution_images /127.5-1.
low_resolution_images = low_resolution_images /127.5-1.

generated_images = generator.predict_on_batch(low_resolution_images)

for index, img inenumerate(generated_images):

save_images(low_resolution_images[index], high_resolution_images[index], img,

      path="results/img_{}_{}".format(epoch, index))

# Save models
generator.save_weights("generator.h5")
discriminator.save_weights("discriminator.h5")

if mode =='predict':

# Build and compile the discriminator network
discriminator = build_discriminator()

# Build the generator network
generator = build_generator()

# Load models
generator.load_weights("generator.h5")
discriminator.load_weights("discriminator.h5")

# Get 10 random images
high_resolution_images, low_resolution_images = sample_images(data_dir= data_dir, batch_size=10,

low_resolution_shape= low_resolution_shape,

high_resolution_shape= high_resolution_shape)

# Normalize images
high_resolution_images = high_resolution_images /127.5-1.

low_resolution_images = low_resolution_images /127.5-1.

# Generate high - resolution images from low - resolution images
generated_images = generator.predict_on_batch(low_resolution_images)

# Save images
for index, img inenumerate(generated_images):

save_images(low_resolution_images[index], high_resolution_images[index], img,
path="results/gen_{}".format(index))

总结：SRGAN的基础上，不断发展演变，出现了ESRGAN, Real-ESRGAN等优秀的图像或视频超分模型，在图像超分质量与速度上均获得很大突破。以GAN为基础的超分模型不断进化，在视频超分领域技术也趋于成熟，未来超分辨技术落地应用将更加广泛。

版块：

web3.0人工智能学院

1. 本站所有资源来源于用户上传和网络，仅作为演示数据，如有侵权请邮件联系站长！
2. 盗版，破解有损他人权益和违法作为，请各位站长支持正版！