1、简介 分辨率:图像分辨率是数字图像中图像像素的测量概念,通常表示为图像的水平像素与垂直像素的乘积。如一张1080p的图像大小为1920×1080,则水平方向存在1920个像素,垂直方向存在1080个像素。通常,分辨率越高,单位尺寸内所包含的像素数越多,图像包含的细节信息越丰富,感官上图像越细腻、越逼真。 图像超分算法:在现实情况下,受采集设备性能、网络传输带宽等因素限制,通常不能得到细节丰富、无成块模糊的高分辨率图像。提高图像分辨率最直接的方法是对光学硬件进行改进,而由于制造工艺难,大幅改进成本昂贵,物理上实现图像超分成本高、难度大。因此,从软件与算法上实现图像超分受到广泛关注。 图像超分目的:采用数字图像处理技术,将低分辨率(LR)图像升至高分辨率(HR)图像的过程。 (1)智能显示领域:普通摄像头拍摄的图像分辨率一般偏低,不能满足高分辨率的视觉要求。目前4K 高清显示逐渐走向普及,但很多成像设备拍摄的图片以及老电影的像素分辨率远不及 4K。 (2)医学成像领域:医学仪器采集得到的图像分辨率通常偏低,高分辨率医学影像有利于发现微小的病灶。 (3)遥感成像领域:遥感成像时卫星或飞机与成像对象之间的距离较远,且受限于传感器技术以及成像设备成本等,采集的图片分辨率低,从而导致目标模糊不清,不利于对图像进行分析。 (4)城市视频监控领域:公共监控系统的摄像头受限于成本等因素往往是低分辨率的,而低分辨率的图片或视频不利于后续的人脸识别、车牌识别和目标结构化特征识别等任务。 (5)图像压缩传输领域:为了降低海量数据对传输带宽的压力,图像或视频数据在传输时会进行压缩处理,比如降低图像的分辨率。但是人们对这些图像或视频的清晰度要求是很高的,因此在接收端就需要使用图像超分辨率技术来提升分辨率,尽可能重建出原有的高清图像或视频。 3、SRGAN原理(1)插值法图像超分 数字图像处理算法通过插值方法对图像超分,如常见的最邻近插值、双线性插值、双三次插值等。 图1. 图像插值 如图1,左图包含2×2像素,右图包含4×4像素,为通过图像插值得到。为得到超分图像,需知右图中每个位置下的像素值。 以最邻近算法为例,超分后图像中的像素值均来自超分前的图像,即新图像不产生像素值。其计算方法如下: srcX=dstX*(srcWidth/dstWidth) srcY=dstY*(srcHeight/dstHeight) srcX,srcY为原图像的X坐标与Y坐标,dstX,dstY为目标图像的X坐标与Y坐标,srcWidth,srcHeight为原图像宽与高,dstWidth,dstHeight为目标图像的宽与高。 例如,右图目标图像(1,2)处的像素对应原图的srcX=1*(2/4)=0.5,四舍五入为1;srcY=2*(2/4)=1。即目标图像在(1,2)处的像素对应原图的(1,1)处的像素。图2为插值完成后的图像,可以发现右图中像素值并未发生变化。 图2. 最邻近法图像插值 最邻近图像插值法,虽能够提升图像分辨率,但新产生的图像仍为原图像像素值,没有增加任何数据与细节。该方法简单直接,时效性好,但是会破坏原有像素的渐变关系,表现为边缘处呈锯齿状结构(如图3所示)。 图3. 最邻近插值法图像超分 除此之外,双线性插值、双三次插值产生的超分图像中的像素值,是基于原图像中特定区域像素值的线性变换得到。通过线性插值得到的图像依然存在模糊,细节、纹理不足等问题。 (2)深度学习图像超分 图4. 深度学习图像超分 深度学习的图像超分旨在通过神经网络学习低清图像与高清图像之间的映射。不同于插值方法,使用简单的线性映射求超分图像像素值。深度学习方法,通过大量的低清图像与高清图像样本对来训练网络模型。训练完成后,超分图像中的像素值 � 由输入图像 � 以及非线性映射 � 求得。 SRCNN: 以CNN图像超分典型网络SRCNN结构为例,结构较为简单。网络仅包括输入层、3层卷积层、输出层。网络使用MSE作为损失函数来训练模型。 不足:网络使用MSE作为损失函数训练模型,重建后的图像与原高清图像的PSNR峰值信噪比较高,但是在视觉感官上表现为画面模型,纹理细节不足。 图5. SRCNN网络结构图 (3)SRGAN SRGAN:GAN在图像超分领域的开山制作,它将深度网络与对抗网络相结合,生成高分辨率图像,SRGAN与SRCNN等方法相比,产生的图像拥有更加丰富的细节信息。 图6. 图像超分对比 如图6所示,最右侧为原始图像,左1为双三次插值得到;左2为SRResNet得到;左3为SRGAN得到。 图像超分对比:由图6发现,双三次插值法的到图像最模糊;SRResNet仅使用了生成网络且采用MSE作为损失函数,生成的图像画质轻微模糊,高频细节不够,如对图中手部“水花”的重建,效果模糊,未能够刻画出纹理信息。相比之下,SRGAN生成的图像纹理细节丰富,更为逼真。 SRGAN能够恢复出精细的纹理细节,生成逼真图像的关键在于两点: · 采用了对抗损失adversarial loss训练生成器,即采用判别器来监督生成器,使其产生更逼真图像 · 采用了感知损失perceptual loss训练生成器,即采用VGG-19中特定层输出的MSE,度量高清图像与重建图像差异 什么是对抗损失函数?有什么作用? 上式表示最大化判别器对于生成器生成的图像,即判别器对生成器生成图像判别分数越高,生成器损失越小。换句话说,生成器生成的图像判别器越难进行分辨,那么这个图像质量就越高,上式值越小。 感知损失函数: 什么是感知损失?人们实验发现,两幅图像经VGG网络的某些中间层的输出越相似,两幅图像从感官上也越相似。因此,SRGAN将真实高清图像与生成超分图像,均输入VGG网络中提取特征,然后计算两图像特征的差值,判别相似度。上式表示,在VGG网络的 � 层输出,对真实高清图像与生成高清图像对应像素计算误差,感知损失函数的目的是缩小该误差,使两图像在感官上更相似。 生成器损失函数: 生成器总的损失函数为两者加权,一般感知损失函数权重选择:0.001。 网络结构: 图7. SRGAN网络结构 如图7所示,SRGAN网络包括生成网络与判别网络。 生成网络为卷积网络,主要由三部分构成: (1)低分辨率图像(64×64×3)输入,经卷积层与LeakyReLU进入下一层; (2)B个残差块,每个残差块包括两个卷积+标准化+ReLU; (3)后接上采样层,经过两次上采样,宽高均变为原来4倍,即输出(256×256×3) 判别网络为普通卷积网络,输入为真实高清图像与生成高清图像,输出为图像概率值,接近1表示为真图片,接近0表示为假图片。 SRGAN的训练分为判别器的训练与生成器的训练。 在每个训练step,一般先训练判别器,再训练生成器。 判别器的训练: (1)随机选取batch_size个真实高清图像; (2)将高清图像进行resize得到低清图像,并传入Generator,生成batch_size个生成高清图像; (3)真实高清图像label为1,生成高清图像label为0,传入判别器进行训练。 生成器的训练: (1)将低清图像传入Generator得到生成高清图像,生成高清图像通过判别器输出得分D,计算损失-logD,从而得到对抗损失; (2)将真实高清图像与生成高清图像,分别传入VGG网络提取特征,根据MSE计算感知损失; (3)计算生成器损失,并根据反向传播,更新生成器网络参数。 4、SRGAN实现下面通过Keras框架,搭建了简单的图像超分模型,通过自己准备的数据集,在CPU电脑上即可完成训练,并取得不错结果。如有兴趣,自行尝试。 importglobimporttime importmatplotlib.pyplotasplt importnumpyasnp import tensorflow astf from keras import Input from keras.applications importVGG19 from keras.callbacks import TensorBoard from keras.layers import BatchNormalization, Activation, LeakyReLU, Add, Dense from keras.layers.convolutional import Conv2D, UpSampling2D from keras.models import Model from keras.optimizers import Adam from scipy.misc import imread, imresize importos importsys CURRENT_PATH =os.path.dirname(os.path.realpath(__file__)) sys.path.append(CURRENT_PATH) defresidual_block(x): """ Residual block """ filters = [64, 64] kernel_size =3 strides =1 padding ="same" momentum =0.8 activation ="relu" res = Conv2D(filters= filters[0], kernel_size= kernel_size, strides= strides, padding= padding)(x) res = Activation(activation= activation)(res) res = BatchNormalization(momentum= momentum)(res) res = Conv2D(filters= filters[1], kernel_size= kernel_size, strides= strides, padding= padding)(res) res = BatchNormalization(momentum= momentum)(res) # Add res and x res = Add()([res, x]) return res defbuild_generator(): """ Create a generator network using the hyperparameter values defined below : return: """ residual_blocks =16 momentum =0.8 input_shape = (64, 64, 3) # Input Layer of the generator network input_layer = Input(shape= input_shape) # Add the pre - residual block gen1 = Conv2D(filters=64, kernel_size=9, strides=1, padding='same', activation='relu')(input_layer) # Add 16 residual blocks res = residual_block(gen1) for i inrange(residual_blocks -1): res = residual_block(res) # Add the post - residual block gen2 = Conv2D(filters=64, kernel_size=3, strides=1, padding='same')(res) gen2 = BatchNormalization(momentum= momentum)(gen2) # Take the sum of the output from the pre - residual block(gen1) and the post - residual block(gen2) gen3 = Add()([gen2, gen1]) # Add an upsampling block gen4 = UpSampling2D(size=2)(gen3) gen4 = Conv2D(filters=256, kernel_size=3, strides=1, padding='same')(gen4) gen4 = Activation('relu')(gen4) # Add another upsampling block gen5 = UpSampling2D(size=2)(gen4) gen5 = Conv2D(filters=256, kernel_size=3, strides=1, padding='same')(gen5) gen5 = Activation('relu')(gen5) # Output convolution layer gen6 = Conv2D(filters=3, kernel_size=9, strides=1, padding='same')(gen5) output = Activation('tanh')(gen6) # Keras model model = Model(inputs= [input_layer], outputs= [output], name='generator') return model defbuild_discriminator(): """ Create a discriminator network using the hyperparameter values defined below : return: """ leakyrelu_alpha =0.2 momentum =0.8 input_shape = (256, 256, 3) input_layer = Input(shape= input_shape) # Add the first convolution block dis1 = Conv2D(filters=64, kernel_size=3, strides=1, padding='same')(input_layer) dis1 = LeakyReLU(alpha= leakyrelu_alpha)(dis1) # Add the 2nd convolution block dis2 = Conv2D(filters=64, kernel_size=3, strides=2, padding='same')(dis1) dis2 = LeakyReLU(alpha= leakyrelu_alpha)(dis2) dis2 = BatchNormalization(momentum= momentum)(dis2) # Add the third convolution block dis3 = Conv2D(filters=128, kernel_size=3, strides=1, padding='same')(dis2) dis3 = LeakyReLU(alpha= leakyrelu_alpha)(dis3) dis3 = BatchNormalization(momentum= momentum)(dis3) # Add the fourth convolution block dis4 = Conv2D(filters=128, kernel_size=3, strides=2, padding='same')(dis3) dis4 = LeakyReLU(alpha= leakyrelu_alpha)(dis4) dis4 = BatchNormalization(momentum=0.8)(dis4) # Add the fifth convolution block dis5 = Conv2D(256, kernel_size=3, strides=1, padding='same')(dis4) dis5 = LeakyReLU(alpha= leakyrelu_alpha)(dis5) dis5 = BatchNormalization(momentum= momentum)(dis5) # Add the sixth convolution block dis6 = Conv2D(filters=256, kernel_size=3, strides=2, padding='same')(dis5) dis6 = LeakyReLU(alpha= leakyrelu_alpha)(dis6) dis6 = BatchNormalization(momentum= momentum)(dis6) # Add the seventh convolution block dis7 = Conv2D(filters=512, kernel_size=3, strides=1, padding='same')(dis6) dis7 = LeakyReLU(alpha= leakyrelu_alpha)(dis7) dis7 = BatchNormalization(momentum= momentum)(dis7) # Add the eight convolution block dis8 = Conv2D(filters=512, kernel_size=3, strides=2, padding='same')(dis7) dis8 = LeakyReLU(alpha= leakyrelu_alpha)(dis8) dis8 = BatchNormalization(momentum= momentum)(dis8) # Add a dense layer dis9 = Dense(units=1024)(dis8) dis9 = LeakyReLU(alpha=0.2)(dis9) # Last dense layer - for classification output = Dense(units=1, activation='sigmoid')(dis9) model = Model(inputs= [input_layer], outputs= [output], name='discriminator') return model defbuild_vgg(): """ Build VGG network to extract image features """ input_shape = (256, 256, 3) # Load a pre - trained VGG19 model trained on 'Imagenet' dataset vgg = VGG19(weights="imagenet") vgg.outputs = [vgg.layers[9].output] input_layer = Input(shape= input_shape) # Extract features features = vgg(input_layer) # Create a Keras model model = Model(inputs= [input_layer], outputs= [features]) return model defsample_images(data_dir, batch_size, high_resolution_shape, low_resolution_shape): # Make a list of all images inside the data directory all_images =glob.glob(data_dir) # Choose a random batch of images images_batch =np.random.choice(all_images, size= batch_size) low_resolution_images = [] high_resolution_images = [] for img in images_batch: # Get an ndarray of the current image img1 = imread(img, mode='RGB') img1 = img1.astype(np.float32) # Resize the image img1_high_resolution = imresize(img1, high_resolution_shape) img1_low_resolution = imresize(img1, low_resolution_shape) # Do a random horizontal flip ifnp.random.random() <0.5: img1_high_resolution =np.fliplr(img1_high_resolution) img1_low_resolution =np.fliplr(img1_low_resolution) high_resolution_images.append(img1_high_resolution) low_resolution_images.append(img1_low_resolution) # Convert the lists to Numpy NDArrays returnnp.array(high_resolution_images), np.array(low_resolution_images) defsave_images(low_resolution_image, original_image, generated_image, path): """ Save low - resolution, high - resolution(original) and generated high - resolution images in a single image """ fig =plt.figure() ax = fig.add_subplot(1, 3, 1) ax.imshow(low_resolution_image) ax.axis("off") ax.set_title("Low-resolution") ax = fig.add_subplot(1, 3, 2) ax.imshow(original_image) ax.axis("off") ax.set_title("Original") ax = fig.add_subplot(1, 3, 3) ax.imshow(generated_image) ax.axis("off") ax.set_title("Generated") plt.savefig(path) defwrite_log(callback, name, value, batch_no): """ Write scalars to Tensorboard """ summary =tf.Summary() summary_value = summary.value.add() summary_value.simple_value = value summary_value.tag = name callback.writer.add_summary(summary, batch_no) callback.writer.flush() if __name__ =='__main__': data_dir =r"image_align_celeba\*.*" epochs =200 batch_size =1 mode ='predict' # Shape of low - resolution and high - resolution images low_resolution_shape = (64, 64, 3) high_resolution_shape = (256, 256, 3) # Common optimizer for all networks common_optimizer = Adam(0.0002, 0.5) if mode =='train': # Build and compile VGG19 network to extract features vgg = build_vgg() vgg.trainable =False vgg.compile(loss='mse', optimizer= common_optimizer, metrics= ['accuracy']) # Build and compile the discriminator network discriminator = build_discriminator() discriminator.compile(loss='mse', optimizer= common_optimizer, metrics= ['accuracy']) # Build the generator network generator = build_generator() """ Build and compile the adversarial model """ # Input layers for high - resolution and low - resolution images input_high_resolution = Input(shape= high_resolution_shape) input_low_resolution = Input(shape= low_resolution_shape) # Generate high - resolution images from low - resolution images generated_high_resolution_images = generator(input_low_resolution) # Extract feature maps of the generated images features = vgg(generated_high_resolution_images) # Make the discriminator network as non - trainable discriminator.trainable =False # Get the probability of generated high - resolution images probs = discriminator(generated_high_resolution_images) # Create and compile an adversarial model adversarial_model = Model([input_low_resolution, input_high_resolution], [probs, features]) adversarial_model.compile(loss= ['binary_crossentropy', 'mse'], loss_weights= [1e-3, 1], optimizer= common_optimizer) # Add Tensorboard tensorboard = TensorBoard(log_dir="logs/".format(time.time())) tensorboard.set_model(generator) tensorboard.set_model(discriminator) for epoch inrange(epochs): print("Epoch:{}".format(epoch)) """ Train the discriminator network """ # Sample a batch of images high_resolution_images, low_resolution_images = sample_images(data_dir= data_dir, batch_size= batch_size, low_resolution_shape= low_resolution_shape, high_resolution_shape= high_resolution_shape) # Normalize images high_resolution_images = high_resolution_images /127.5-1. low_resolution_images = low_resolution_images /127.5-1. # Generate high - resolution images from low - resolution images generated_high_resolution_images = generator.predict(low_resolution_images) # Generate batch of real and fake labels real_labels =np.ones((batch_size, 16, 16, 1)) fake_labels =np.zeros((batch_size, 16, 16, 1)) # Train the discriminator network on real and fake images d_loss_real = discriminator.train_on_batch(high_resolution_images, real_labels) d_loss_fake = discriminator.train_on_batch(generated_high_resolution_images, fake_labels) # Calculate total discriminator loss d_loss =0.5*np.add(d_loss_real, d_loss_fake) print("d_loss:", d_loss) """ Train the generator network """ # Sample a batch of images high_resolution_images, low_resolution_images = sample_images(data_dir= data_dir, batch_size= batch_size, low_resolution_shape= low_resolution_shape, high_resolution_shape= high_resolution_shape) # Normalize images high_resolution_images = high_resolution_images /127.5-1. low_resolution_images = low_resolution_images /127.5-1. # Extract feature maps for real high - resolution images image_features = vgg.predict(high_resolution_images) # Train the generator network g_loss = adversarial_model.train_on_batch([low_resolution_images, high_resolution_images], [real_labels, image_features]) print("g_loss:", g_loss) # Write the losses to Tensorboard write_log(tensorboard, 'g_loss', g_loss[0], epoch) write_log(tensorboard, 'd_loss', d_loss[0], epoch) # Sample and save images after every 100 epochs if epoch %100==0: high_resolution_images, low_resolution_images = sample_images(data_dir= data_dir, batch_size= batch_size, low_resolution_shape= low_resolution_shape, high_resolution_shape= high_resolution_shape) # Normalize images high_resolution_images = high_resolution_images /127.5-1. low_resolution_images = low_resolution_images /127.5-1. generated_images = generator.predict_on_batch(low_resolution_images) for index, img inenumerate(generated_images): save_images(low_resolution_images[index], high_resolution_images[index], img, path="results/img_{}_{}".format(epoch, index)) # Save models generator.save_weights("generator.h5") discriminator.save_weights("discriminator.h5") if mode =='predict': # Build and compile the discriminator network discriminator = build_discriminator() # Build the generator network generator = build_generator() # Load models generator.load_weights("generator.h5") discriminator.load_weights("discriminator.h5") # Get 10 random images high_resolution_images, low_resolution_images = sample_images(data_dir= data_dir, batch_size=10, low_resolution_shape= low_resolution_shape, high_resolution_shape= high_resolution_shape) # Normalize images high_resolution_images = high_resolution_images /127.5-1. low_resolution_images = low_resolution_images /127.5-1. # Generate high - resolution images from low - resolution images generated_images = generator.predict_on_batch(low_resolution_images) # Save images for index, img inenumerate(generated_images): save_images(low_resolution_images[index], high_resolution_images[index], img, path="results/gen_{}".format(index)) 总结:SRGAN的基础上,不断发展演变,出现了ESRGAN, Real-ESRGAN等优秀的图像或视频超分模型,在图像超分质量与速度上均获得很大突破。以GAN为基础的超分模型不断进化,在视频超分领域技术也趋于成熟,未来超分辨技术落地应用将更加广泛。 |
版块:
web3.0人工智能学院
1. 本站所有资源来源于用户上传和网络,仅作为演示数据,如有侵权请邮件联系站长!
2. 盗版,破解有损他人权益和违法作为,请各位站长支持正版!
2. 盗版,破解有损他人权益和违法作为,请各位站长支持正版!