Segmentação semântica Segmentação de elipses e retângulos ... · cv2.imwrite(argv[3],QP)...

39
Segmentação semântica Segmentação de elipses e retângulos (usa OpenCV e Keras): [/home/hae/goodreader/algoritmos/deep/Stanford_cs231/cs231n_2017_lecture11_outros_problemas] Alguns anos atrás, fazia downsampling seguido de upsampling. Problema: A localização de fronteira entre regiões não é precisa. Unet resolve este problema. 1

Transcript of Segmentação semântica Segmentação de elipses e retângulos ... · cv2.imwrite(argv[3],QP)...

Page 1: Segmentação semântica Segmentação de elipses e retângulos ... · cv2.imwrite(argv[3],QP) else: qp=255.0*((qp/2.0)+0.5) # Entre 0 e 255 qp=np.clip(qp,0,255) QP=np.uint8(qp) cv2.imwrite(argv[3],QP)

Segmentação semânticaSegmentação de elipses e retângulos (usa OpenCV e Keras):[/home/hae/goodreader/algoritmos/deep/Stanford_cs231/cs231n_2017_lecture11_outros_problemas]

Alguns anos atrás, fazia downsampling seguido de upsampling.

Problema: A localização de fronteira entre regiões não é precisa.

Unet resolve este problema.

1

Page 2: Segmentação semântica Segmentação de elipses e retângulos ... · cv2.imwrite(argv[3],QP) else: qp=255.0*((qp/2.0)+0.5) # Entre 0 e 255 qp=np.clip(qp,0,255) QP=np.uint8(qp) cv2.imwrite(argv[3],QP)

ruidosa ideal saída CNN saída CNN bin limiarização

075x 075y 075p 075b 075t

saída Unet Unet binarizada

imagen ruidosa segmentaçãoideal

segmentação CNN segmentação CNNbinarizada

segmentaçãofiltros

100x.png 100y.png 100o.png 100p.png 100q.png

saída Unet Unet binarizada

000x.png 000y.png 000o.png 000p.png 000q.png

saída Unet Unet binarizada

2

Page 3: Segmentação semântica Segmentação de elipses e retângulos ... · cv2.imwrite(argv[3],QP) else: qp=255.0*((qp/2.0)+0.5) # Entre 0 e 255 qp=np.clip(qp,0,255) QP=np.uint8(qp) cv2.imwrite(argv[3],QP)

Treino.csv:000x.png;000y.png100x.png;100y.png001x.png;001y.png101x.png;101y.png...048x.png;048y.png148x.png;148y.png049x.png;049y.png149x.png;149y.png

Valida.csv:050x.png;050y.png150x.png;150y.png051x.png;051y.png151x.png;151y.png...073x.png;073y.png173x.png;173y.png074x.png;074y.png174x.png;174y.png

teste.csv:075x.png;075y.png175x.png;175y.png076x.png;076y.png176x.png;176y.png...098x.png;098y.png198x.png;198y.png099x.png;099y.png199x.png;199y.png

Para fazer segmentação semântica, deve fazer upsampling. Há duas formas de fazer upsampling:

1) Unpooling: contrário de maxpooling (veja transparências).Dá a impressão de que não está implementado em Keras. Só está implementado UpSampling2Dque faz interpolação vizinho mais próximo.

2) Transpose convolution: contrário de convolução (veja transparências).

3

Page 4: Segmentação semântica Segmentação de elipses e retângulos ... · cv2.imwrite(argv[3],QP) else: qp=255.0*((qp/2.0)+0.5) # Entre 0 e 255 qp=np.clip(qp,0,255) QP=np.uint8(qp) cv2.imwrite(argv[3],QP)

Estrutura da rede (ignorando dropout):

4

Page 5: Segmentação semântica Segmentação de elipses e retângulos ... · cv2.imwrite(argv[3],QP) else: qp=255.0*((qp/2.0)+0.5) # Entre 0 e 255 qp=np.clip(qp,0,255) QP=np.uint8(qp) cv2.imwrite(argv[3],QP)

#~/haepi/deep/keras/segm_eliret/segm02.py#Rede para segmentacao semanticafrom __future__ import print_functionimport cv2import numpy as npnp.random.seed(7)import tensorflow.keras as kerasfrom keras.models import Sequentialfrom keras.layers import Dropout, Conv2D, Conv2DTransposefrom keras import optimizers

def leCsv(nomeArq): n=0 arq=open(nomeArq,"r") for linha in arq: n=n+1

nl,nc = 32,32 AX=np.empty((n,nl,nc),dtype='uint8') AY=np.empty((n,nl,nc),dtype='uint8') i=0 arq.seek(0) for linha in arq: linha=linha.strip('\n') linha=linha.split(';') AX[i]=cv2.imread(linha[0],0) AY[i]=cv2.imread(linha[1],0) i=i+1

arq.close()

ax=2*(np.float32(AX)/255.0-0.5) #Entre -1 e +1 ay=2*(np.float32(AY)/255.0-0.5) #Entre -1 e +1 ax = ax.reshape(n, nl, nc, 1) ay = ay.reshape(n, nl, nc, 1) return ax, ay

ax, ay = leCsv("treino.csv")vx, vy = leCsv("valida.csv")qx, qy = leCsv("teste.csv")

nl,nc = 32,32input_shape = (nl,nc,1)batch_size = 20epochs = 2000

model = Sequential()model.add(Conv2D(40, kernel_size=(5,5), strides=2, activation='relu', padding='same', input_shape=input_shape)) model.add(Dropout(0.25))model.add(Conv2D(12, kernel_size=(5,5), strides=2, activation='relu', padding='same'))model.add(Dropout(0.25))model.add(Conv2D(12, kernel_size=(5,5), strides=2, activation='relu', padding='same'))model.add(Dropout(0.25))model.add(Conv2DTranspose(12, kernel_size=(5,5), strides=2, activation='relu', padding='same'))model.add(Dropout(0.25))model.add(Conv2DTranspose(40, kernel_size=(5,5), strides=2, activation='relu', padding='same'))model.add(Dropout(0.25))

5

Page 6: Segmentação semântica Segmentação de elipses e retângulos ... · cv2.imwrite(argv[3],QP) else: qp=255.0*((qp/2.0)+0.5) # Entre 0 e 255 qp=np.clip(qp,0,255) QP=np.uint8(qp) cv2.imwrite(argv[3],QP)

model.add(Conv2DTranspose(1, kernel_size=(5,5), strides=2, padding='same'))

opt=optimizers.Adagrad()model.compile(optimizer=opt, loss='mean_squared_error')

model.fit(ax, ay, batch_size=batch_size, epochs=epochs, verbose=True, valida-tion_data=(vx,vy))

score = model.evaluate(ax, ay, verbose=0)print('Training loss:', score)score = model.evaluate(vx, vy, verbose=0)print('Validation loss:', score)score = model.evaluate(qx, qy, verbose=0)print('Test loss:', score)

model.save('segm02.h5')

Epoch 1/2000 - 1s 13ms/step - loss: 1.0158 - val_loss: 0.8944Epoch 2/2000 - 0s 772us/step - loss: 0.8473 - val_loss: 0.8118...Epoch 1999/2000 - 0s 839us/step - loss: 0.0852 - val_loss: 0.0685Epoch 2000/2000 - 0s 845us/step - loss: 0.0807 - val_loss: 0.0685Training loss: 0.0500007325411Validation loss: 0.0684906864166Test loss: 0.0716854807734Demora menos de 3 minutos.

Se eliminar dropout, o erro fica bem maior:Training loss: 0.0861751502752Validation loss: 0.132946297526Test loss: 0.13650231123

6

Page 7: Segmentação semântica Segmentação de elipses e retângulos ... · cv2.imwrite(argv[3],QP) else: qp=255.0*((qp/2.0)+0.5) # Entre 0 e 255 qp=np.clip(qp,0,255) QP=np.uint8(qp) cv2.imwrite(argv[3],QP)

#~/deep/keras/segm_eliret/pred02.py#Faz segmentacao semantica usando rede gerada pelo segm??.py (retangulo elipse)from __future__ import print_functionimport cv2import numpy as npnp.random.seed(7)import tensorflow.keras as kerasfrom keras.models import load_modelfrom keras.layers import Dropout, Conv2D, Conv2DTransposefrom keras import optimizersimport sysfrom sys import argv

if (len(argv)!=5): print("pred.py rede.h5 ent.png sai.png b/g") print(" b=binarizado g=grayscale") print("Erro: Numero de argumentos invalido") sys.exit() model = load_model(argv[1])

nl,nc = 32,32QX=cv2.imread(argv[2],0)qx=2*(np.float32(QX)/255.0-0.5) #Entre -1 e +1qx=qx.reshape(1, nl, nc, 1)

qp=model.predict(qx)qp=qp.reshape(nl,nc) # entre -1 e +1

if (argv[4]=="b"): QP=np.empty((nl,nc),dtype='uint8') for l in range(qp.shape[0]): for c in range(qp.shape[1]): if qp[l,c]<0: QP[l,c]=0 else: QP[l,c]=255 cv2.imwrite(argv[3],QP)else: qp=255.0*((qp/2.0)+0.5) # Entre 0 e 255 qp=np.clip(qp,0,255) QP=np.uint8(qp) cv2.imwrite(argv[3],QP)

$ python pred02.py segm02.h5 075x.png 075p.png g$ python pred02.py segm02.h5 075x.png 075b.png b

7

Page 8: Segmentação semântica Segmentação de elipses e retângulos ... · cv2.imwrite(argv[3],QP) else: qp=255.0*((qp/2.0)+0.5) # Entre 0 e 255 qp=np.clip(qp,0,255) QP=np.uint8(qp) cv2.imwrite(argv[3],QP)

Vamos resolver o mesmo problema com Unet. Por motivo desconhecido, este programa nãofunciona se normalizar a entrada para -1 a +1.

#~/deep/keras/segm_eliret/unet1.py#Rede para segmentacao semanticaimport cv2import numpy as npnp.random.seed(7)import kerasfrom keras.models import *from keras.layers import *from keras.optimizers import *import sys

import osos.environ['TF_CPP_MIN_LOG_LEVEL'] = '3'

def leCsv(nomeArq): n=0 arq=open(nomeArq,"r") for linha in arq: n=n+1

nl,nc = 32,32 AX=np.empty((n,nl,nc),dtype='uint8') AY=np.empty((n,nl,nc),dtype='uint8')

i=0 arq.seek(0) for linha in arq: linha=linha.strip('\n') linha=linha.split(';') AX[i]=cv2.imread(linha[0],0) AY[i]=cv2.imread(linha[1],0) i=i+1

arq.close()

ax= np.float32(AX)/255.0 #Entre 0 e +1 ay= np.float32(AY)/255.0 #Entre 0 e +1 ax = ax.reshape(n, nl, nc, 1) ay = ay.reshape(n, nl, nc, 1) return ax, ay

def unet(input_size = (32,32,1)): n=32 inputs = Input(input_size) #32x32 conv2 = Conv2D(n, 3, activation = 'relu', padding = 'same' )(inputs) conv2 = Conv2D(n, 3, activation = 'relu', padding = 'same' )(conv2) pool2 = MaxPooling2D(pool_size=(2, 2))(conv2) #16x16

conv3 = Conv2D(2*n, 3, activation = 'relu', padding = 'same' )(pool2) #16x16 conv3 = Conv2D(2*n, 3, activation = 'relu', padding = 'same' )(conv3) pool3 = MaxPooling2D(pool_size=(2, 2))(conv3) #8x8

conv4 = Conv2D(4*n, 3, activation = 'relu', padding = 'same' )(pool3) #8x8 conv4 = Conv2D(4*n, 3, activation = 'relu', padding = 'same' )(conv4) pool4 = MaxPooling2D(pool_size=(2, 2))(conv4) #4x4

conv5 = Conv2D(8*n, 3, activation = 'relu', padding = 'same' )(pool4) #4x4 conv5 = Conv2D(8*n, 3, activation = 'relu', padding = 'same' )(conv5) #4x4

up6 = Conv2D(4*n, 2, activation = 'relu', padding = 'same' )(UpSampling2D(size = (2,2))(conv5)) #8x8 merge6 = concatenate([conv4,up6], axis = 3) #8x8 conv6 = Conv2D(4*n, 3, activation = 'relu', padding = 'same' )(merge6) conv6 = Conv2D(4*n, 3, activation = 'relu', padding = 'same' )(conv6) #8x8

up7 = Conv2D(2*n, 2, activation = 'relu', padding = 'same' )(UpSampling2D(size = (2,2))(conv6)) #16x16 merge7 = concatenate([conv3,up7], axis = 3) conv7 = Conv2D(2*n, 3, activation = 'relu', padding = 'same' )(merge7) conv7 = Conv2D(2*n, 3, activation = 'relu', padding = 'same' )(conv7) #16x16

up8 = Conv2D(n, 2, activation = 'relu', padding = 'same' )(UpSampling2D(size = (2,2))(conv7)) #32x32 merge8 = concatenate([conv2,up8], axis = 3) conv8 = Conv2D(n, 3, activation = 'relu', padding = 'same' )(merge8) conv8 = Conv2D(n, 3, activation = 'relu', padding = 'same' )(conv8) #32x32

8

Page 9: Segmentação semântica Segmentação de elipses e retângulos ... · cv2.imwrite(argv[3],QP) else: qp=255.0*((qp/2.0)+0.5) # Entre 0 e 255 qp=np.clip(qp,0,255) QP=np.uint8(qp) cv2.imwrite(argv[3],QP)

conv9 = Conv2D(1, 1, activation = 'sigmoid', padding = 'same' )(conv8) #32x32 model = Model(inputs = inputs, outputs = conv9) model.compile(optimizer = Adam(lr=1e-4), loss = 'mean_squared_error') model.summary() return model

#mainax, ay = leCsv("treino.csv")vx, vy = leCsv("valida.csv")qx, qy = leCsv("teste.csv")

#Escolha entre comecar treino do zero ou continuar o treino de onde paroumodel=unet()#model = load_model("unet1.h5");

reduce_lr = keras.callbacks.ReduceLROnPlateau(monitor='loss', factor=0.95, patience=10, min_lr=1e-6, verbose=True)model.fit(ax,ay,batch_size=10,epochs=1000,verbose=1,callbacks=[reduce_lr]);model.save("unet1.h5");

score = model.evaluate(ax, ay, verbose=0)print('Training loss:', score)score = model.evaluate(vx, vy, verbose=0)print('Validation loss:', score)score = model.evaluate(qx, qy, verbose=0)print('Test loss:', score)

Epoch 1/1000 - 3s 25ms/step - loss: 0.2468Epoch 2/1000 - 0s 3ms/step - loss: 0.2325Epoch 3/1000 - 0s 3ms/step - loss: 0.2012...Epoch 998/1000 - 0s 3ms/step - loss: 1.1962e-05Epoch 999/1000 - 0s 3ms/step - loss: 1.1959e-05Epoch 1000/1000 - 0s 3ms/step - loss: 1.1955e-05

Training loss: 1.1949643849220592e-05Validation loss: 0.014191047586500645Test loss: 0.014867232739925384

O erro diminuiu umas 4 vezes.

9

Page 10: Segmentação semântica Segmentação de elipses e retângulos ... · cv2.imwrite(argv[3],QP) else: qp=255.0*((qp/2.0)+0.5) # Entre 0 e 255 qp=np.clip(qp,0,255) QP=np.uint8(qp) cv2.imwrite(argv[3],QP)

#~/deep/keras/segm_eliret/unetpred1.py#Faz segmentacao semantica usando rede gerada pelo segm??.py (retangulo elipse)import cv2import numpy as npnp.random.seed(7)import kerasfrom keras.models import load_modelfrom keras.layers import Dropout, Conv2D, Conv2DTransposefrom keras import optimizersimport sysfrom sys import argv

import osos.environ['TF_CPP_MIN_LOG_LEVEL'] = '3'

if (len(argv)!=5): print("unetpred.py rede.h5 ent.png sai.png b/g") print(" b=binarizado g=grayscale") print("Erro: Numero de argumentos invalido") sys.exit()

model = load_model(argv[1])

nl,nc = 32,32QX=cv2.imread(argv[2],0)qx=np.float32(QX)/255.0 #Entre 0 e +1qx=qx.reshape(1, nl, nc, 1)

qp=model.predict(qx)qp=qp.reshape(nl,nc) # entre 0 e +1

if (argv[4]=="b"): QP=np.empty((nl,nc),dtype='uint8') for l in range(qp.shape[0]): for c in range(qp.shape[1]): if qp[l,c]<0.5: QP[l,c]=0 else: QP[l,c]=255 cv2.imwrite(argv[3],QP)else: qp=255.0*qp # Entre 0 e 255 qp=np.clip(qp,0,255) QP=np.uint8(qp) cv2.imwrite(argv[3],QP)

10

Page 11: Segmentação semântica Segmentação de elipses e retângulos ... · cv2.imwrite(argv[3],QP) else: qp=255.0*((qp/2.0)+0.5) # Entre 0 e 255 qp=np.clip(qp,0,255) QP=np.uint8(qp) cv2.imwrite(argv[3],QP)

U-net

RONNEBERGER, Olaf; FISCHER, Philipp; BROX, Thomas. U-net: Convolutional networks forbiomedical image segmentation. In: International Conference on Medical image computing andcomputer-assisted intervention. Springer, Cham, 2015. p. 234-241.

0.png 0.png

10.png 10.png

Quer detectar as paredes das células. No site:

https://github.com/zhixuhao/unet

Há 30 imagens exemplos (entrada-saída), 512x512.

11

Page 12: Segmentação semântica Segmentação de elipses e retângulos ... · cv2.imwrite(argv[3],QP) else: qp=255.0*((qp/2.0)+0.5) # Entre 0 e 255 qp=np.clip(qp,0,255) QP=np.uint8(qp) cv2.imwrite(argv[3],QP)

No site:https://github.com/zhixuhao/unet

Há uma solução para o problema. Só que, no meu computador, essa solução vaza memória(memory leak). O programa ocupa toda a memória, começa usar memória virtual, e trava ocomputador. (Talvez por que rodei no Python 2?)

Como só há 30 exemplos para treino, primeiro deve aumentar artificialmente exemplos detreinamento. Para cada imagem, gero 10 imagens distorcidas. Para acelerar processamento, usoimagens 128x128.

O programa: /home/hae/haepi/deep/keras/unet/hae/gera1.pypega imagens de: /home/hae/haepi/deep/keras/unet/hae/membrane/train/imagereduz para 128x128 e distorce-os (data augmentation) e armazena as imagens distorcidas em im-age_aug.Também pega labels de: /home/hae/haepi/deep/keras/unet/hae/membrane/train/labele distorcê-los da mesma forma e armazenar em label_aug

12

Page 13: Segmentação semântica Segmentação de elipses e retângulos ... · cv2.imwrite(argv[3],QP) else: qp=255.0*((qp/2.0)+0.5) # Entre 0 e 255 qp=np.clip(qp,0,255) QP=np.uint8(qp) cv2.imwrite(argv[3],QP)

Algumas versões reduzidas e distorcidas de 0.png:

#gera1.py#Reduz imagem de 512x512 para 128x128#Gera 10 imagens distorcidas para cada imagem de entrada#from __future__ import print_functionimport tensorflow.keras as keras;from keras.preprocessing.image import ImageDataGenerator;import numpy as np;import cv2;

#<<<<<<<<<<<<<<< main <<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<aug_dict = dict(rotation_range=10, #Int. Degree range for random rotations. width_shift_range=-0.05, #float: fraction of total width, if < 1, or pixels if >= 1. height_shift_range=-0.05, #float: fraction of total height, if < 1, or pixels if >= 1. shear_range=10, #Float. Shear Intensity (Shear angle in counter-clockwise direction in degrees) zoom_range=0.2, #Range for random zoom. If a float, [lower, upper] = [1-zoom_range, 1+zoom_range]. horizontal_flip=False, #Boolean. Randomly flip inputs horizontally. fill_mode='reflect'); #One of {"constant", "nearest", "reflect" or "wrap"}.train_path='membrane/train';image_folder='image';mask_folder= 'label';target_size = (128,128);batch_size=30;seed = 7;save_to_dir = None;

image_datagen = ImageDataGenerator(**aug_dict);mask_datagen = ImageDataGenerator(**aug_dict);

image_generator = image_datagen.flow_from_directory( train_path, classes = [image_folder], class_mode = None, color_mode = "grayscale", target_size = target_size, batch_size = batch_size, save_to_dir = 'membrane/train/image_aug', save_prefix = "", seed = seed);mask_generator = mask_datagen.flow_from_directory( train_path, classes = [mask_folder], class_mode = None, color_mode = "grayscale", target_size = target_size, batch_size = batch_size, save_to_dir = 'membrane/train/label_aug', save_prefix = "", seed = seed);

13

Page 14: Segmentação semântica Segmentação de elipses e retângulos ... · cv2.imwrite(argv[3],QP) else: qp=255.0*((qp/2.0)+0.5) # Entre 0 e 255 qp=np.clip(qp,0,255) QP=np.uint8(qp) cv2.imwrite(argv[3],QP)

for i in range(10): img=image_generator.next(); #gera uma distorcao de 30 imagens mask=mask_generator.next(); #gera uma distorcao de 30 masks (labels)

14

Page 15: Segmentação semântica Segmentação de elipses e retângulos ... · cv2.imwrite(argv[3],QP) else: qp=255.0*((qp/2.0)+0.5) # Entre 0 e 255 qp=np.clip(qp,0,255) QP=np.uint8(qp) cv2.imwrite(argv[3],QP)

Parece que é possível gerar amostras distorcidas à medida em que precisa (real-time dataaugmentation). Mas para mim não deu certo. Gerei as amostras distorcidas e gravei como imagens.Depois, rodei o treino:

#treina1.py#from __future__ import print_function;import cv2;import numpy as np; np.random.seed(7);import os;import sys;import tensorflow.keras as keras;

from keras.models import *from keras.layers import *from keras.optimizers import *from keras.callbacks import ModelCheckpoint, LearningRateSchedulerfrom keras import backend as keras

def unet(input_size = (128,128,1)): inputs = Input(input_size) #128x128 conv2 = Conv2D(64, 3, activation = 'relu', padding = 'same' )(inputs) conv2 = Conv2D(64, 3, activation = 'relu', padding = 'same' )(conv2) pool2 = MaxPooling2D(pool_size=(2, 2))(conv2) #64x64 conv3 = Conv2D(128, 3, activation = 'relu', padding = 'same' )(pool2) conv3 = Conv2D(128, 3, activation = 'relu', padding = 'same' )(conv3) pool3 = MaxPooling2D(pool_size=(2, 2))(conv3) #32x32 conv4 = Conv2D(256, 3, activation = 'relu', padding = 'same' )(pool3) conv4 = Conv2D(256, 3, activation = 'relu', padding = 'same' )(conv4) drop4 = Dropout(0.5)(conv4) #32x32 pool4 = MaxPooling2D(pool_size=(2, 2))(drop4) #16x16

conv5 = Conv2D(512, 3, activation = 'relu', padding = 'same' )(pool4) conv5 = Conv2D(512, 3, activation = 'relu', padding = 'same' )(conv5) drop5 = Dropout(0.5)(conv5) #16x16

up6 = Conv2D(256, 2, activation = 'relu', padding = 'same' )(UpSampling2D(size = (2,2))(drop5)) #32x32 merge6 = concatenate([drop4,up6], axis = 3)#32x32 conv6 = Conv2D(256, 3, activation = 'relu', padding = 'same' )(merge6) conv6 = Conv2D(256, 3, activation = 'relu', padding = 'same' )(conv6)

up7 = Conv2D(128, 2, activation = 'relu', padding = 'same' )(UpSampling2D(size = (2,2))(conv6)) #64x64 merge7 = concatenate([conv3,up7], axis = 3) conv7 = Conv2D(128, 3, activation = 'relu', padding = 'same' )(merge7) conv7 = Conv2D(128, 3, activation = 'relu', padding = 'same' )(conv7)

up8 = Conv2D(64, 2, activation = 'relu', padding = 'same' )(UpSampling2D(size = (2,2))(conv7)) #128x128 merge8 = concatenate([conv2,up8], axis = 3) conv8 = Conv2D(64, 3, activation = 'relu', padding = 'same' )(merge8) conv8 = Conv2D(64, 3, activation = 'relu', padding = 'same' )(conv8)

conv8 = Conv2D(2, 3, activation = 'relu', padding = 'same' )(conv8)

conv9 = Conv2D(1, 1, activation = 'sigmoid', padding = 'same' , bias_initializer=initializers.Constant(value=1.5))(conv8)

model = Model(inputs = inputs, outputs = conv9)

model.compile(optimizer = Adam(lr = 1e-4, decay=0.00), loss = 'mean_squared_error', metrics = ['accuracy'])

model.summary() return model

def leDoisDirs(imagePath,maskPath): #Le imagens em dois diretorios com nomes iguais e retorna como float32 entre 0 e +1 imageList = [f for f in os.listdir(imagePath) if os.path.isfile(os.path.join(imagePath, f))]; imageList.sort(); maskList = [f for f in os.listdir(maskPath) if os.path.isfile(os.path.join(maskPath, f))]; maskList.sort(); if (len(imageList)!=len(maskList)): print("Erro: Numero de arquivos diferentes"); sys.exit(0); n=len(imageList);

nl,nc = 128,128; AX=np.empty((n,nl,nc),dtype='uint8')

15

Page 16: Segmentação semântica Segmentação de elipses e retângulos ... · cv2.imwrite(argv[3],QP) else: qp=255.0*((qp/2.0)+0.5) # Entre 0 e 255 qp=np.clip(qp,0,255) QP=np.uint8(qp) cv2.imwrite(argv[3],QP)

AY=np.empty((n,nl,nc),dtype='uint8')

for i in range(n): if (imageList[i]!=maskList[i]): print("Erro: Nome image diferente de nome mask"); sys.exit(0); AX[i]=cv2.imread(os.path.join(imagePath, imageList[i]),0); AY[i]=cv2.imread(os.path.join(maskPath, maskList[i]),0);

ax=np.float32(AX)/255.0; #Entre 0 e +1 ay=np.float32(AY)/255.0; #Entre 0 e +1 ay[ay>=0.5] = 1; ay[ay<0.5] = 0; #0 ou +1

ax = ax.reshape(n, nl, nc, 1); ay = ay.reshape(n, nl, nc, 1);

return ax, ay;

#<<<<<<<<<<<<<<<<< main <<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<ax, ay = leDoisDirs("membrane/train/image_aug","membrane/train/label_aug");

#Escolha entre comecar treino do zero ou continuar o treino de onde paroumodel = unet();#model = load_model("treina1.h5");

model.fit(ax,ay,batch_size=10,epochs=10,verbose=1);model.save("treina1.h5");

$ python3 treina1.pyUsing TensorFlow backend.__________________________________________________________________________________________________Layer (type) Output Shape Param # Connected to ==================================================================================================input_1 (InputLayer) (None, 128, 128, 1) 0 __________________________________________________________________________________________________conv2d_1 (Conv2D) (None, 128, 128, 64) 640 input_1[0][0] __________________________________________________________________________________________________conv2d_2 (Conv2D) (None, 128, 128, 64) 36928 conv2d_1[0][0] __________________________________________________________________________________________________max_pooling2d_1 (MaxPooling2D) (None, 64, 64, 64) 0 conv2d_2[0][0] __________________________________________________________________________________________________conv2d_3 (Conv2D) (None, 64, 64, 128) 73856 max_pooling2d_1[0][0] __________________________________________________________________________________________________conv2d_4 (Conv2D) (None, 64, 64, 128) 147584 conv2d_3[0][0] __________________________________________________________________________________________________max_pooling2d_2 (MaxPooling2D) (None, 32, 32, 128) 0 conv2d_4[0][0] __________________________________________________________________________________________________conv2d_5 (Conv2D) (None, 32, 32, 256) 295168 max_pooling2d_2[0][0] __________________________________________________________________________________________________conv2d_6 (Conv2D) (None, 32, 32, 256) 590080 conv2d_5[0][0] __________________________________________________________________________________________________dropout_1 (Dropout) (None, 32, 32, 256) 0 conv2d_6[0][0] __________________________________________________________________________________________________max_pooling2d_3 (MaxPooling2D) (None, 16, 16, 256) 0 dropout_1[0][0] __________________________________________________________________________________________________conv2d_7 (Conv2D) (None, 16, 16, 512) 1180160 max_pooling2d_3[0][0] __________________________________________________________________________________________________conv2d_8 (Conv2D) (None, 16, 16, 512) 2359808 conv2d_7[0][0] __________________________________________________________________________________________________dropout_2 (Dropout) (None, 16, 16, 512) 0 conv2d_8[0][0] __________________________________________________________________________________________________up_sampling2d_1 (UpSampling2D) (None, 32, 32, 512) 0 dropout_2[0][0] __________________________________________________________________________________________________conv2d_9 (Conv2D) (None, 32, 32, 256) 524544 up_sampling2d_1[0][0] __________________________________________________________________________________________________concatenate_1 (Concatenate) (None, 32, 32, 512) 0 dropout_1[0][0] conv2d_9[0][0] __________________________________________________________________________________________________conv2d_10 (Conv2D) (None, 32, 32, 256) 1179904 concatenate_1[0][0] __________________________________________________________________________________________________conv2d_11 (Conv2D) (None, 32, 32, 256) 590080 conv2d_10[0][0] __________________________________________________________________________________________________up_sampling2d_2 (UpSampling2D) (None, 64, 64, 256) 0 conv2d_11[0][0] __________________________________________________________________________________________________conv2d_12 (Conv2D) (None, 64, 64, 128) 131200 up_sampling2d_2[0][0] __________________________________________________________________________________________________concatenate_2 (Concatenate) (None, 64, 64, 256) 0 conv2d_4[0][0] conv2d_12[0][0] __________________________________________________________________________________________________conv2d_13 (Conv2D) (None, 64, 64, 128) 295040 concatenate_2[0][0]

16

Page 17: Segmentação semântica Segmentação de elipses e retângulos ... · cv2.imwrite(argv[3],QP) else: qp=255.0*((qp/2.0)+0.5) # Entre 0 e 255 qp=np.clip(qp,0,255) QP=np.uint8(qp) cv2.imwrite(argv[3],QP)

__________________________________________________________________________________________________conv2d_14 (Conv2D) (None, 64, 64, 128) 147584 conv2d_13[0][0] __________________________________________________________________________________________________up_sampling2d_3 (UpSampling2D) (None, 128, 128, 128 0 conv2d_14[0][0] __________________________________________________________________________________________________conv2d_15 (Conv2D) (None, 128, 128, 164 84132 up_sampling2d_3[0][0] __________________________________________________________________________________________________concatenate_3 (Concatenate) (None, 128, 128, 228 0 conv2d_2[0][0] conv2d_15[0][0] __________________________________________________________________________________________________conv2d_16 (Conv2D) (None, 128, 128, 64) 131392 concatenate_3[0][0] __________________________________________________________________________________________________conv2d_17 (Conv2D) (None, 128, 128, 64) 36928 conv2d_16[0][0] __________________________________________________________________________________________________conv2d_18 (Conv2D) (None, 128, 128, 2) 1154 conv2d_17[0][0] __________________________________________________________________________________________________conv2d_19 (Conv2D) (None, 128, 128, 1) 3 conv2d_18[0][0] ==================================================================================================Total params: 7,806,185Trainable params: 7,806,185Non-trainable params: 0

...Epoch 19/20 - 21s 71ms/step - loss: 0.0769 - acc: 0.8916Epoch 20/20 - 21s 69ms/step - loss: 0.0766 - acc: 0.8923

É possível continuar o treino de onde parou. Basta comentar a primeira linha e descomentar asegunda:

model = unet();#model = load_model("treina1.h5");

Rodando mais 10 epochs:

Epoch 9/10 - 20s 67ms/step - loss: 0.0671 - acc: 0.9062Epoch 10/10 - 20s 67ms/step - loss: 0.0662 - acc: 0.9076

17

Page 18: Segmentação semântica Segmentação de elipses e retângulos ... · cv2.imwrite(argv[3],QP) else: qp=255.0*((qp/2.0)+0.5) # Entre 0 e 255 qp=np.clip(qp,0,255) QP=np.uint8(qp) cv2.imwrite(argv[3],QP)

Predição:

#testa1.py#from __future__ import print_function;import cv2;import numpy as np; np.random.seed(7);import os;import sys;

import tensorflow.keras as kerasfrom keras.models import *from keras.layers import *from keras.optimizers import *from keras import backend as keras

def leUmDir(imagePath): #Le imagens em um diretorio e retorna como float32 entre 0 e +1 #Tambem retorna os nomes das imagens imageList = [f for f in os.listdir(imagePath) if os.path.isfile(os.path.join(imagePath, f))]; imageList.sort(); n=len(imageList);

nl,nc = 128,128; AX=np.empty((n,nl,nc),dtype='uint8')

for i in range(n): t=cv2.imread(os.path.join(imagePath, imageList[i]),0); t=cv2.resize(t,(nc,nl),interpolation=cv2.INTER_AREA); AX[i,:]=t;

ax = np.float32(AX)/255.0; #Entre 0 e +1 ax = ax.reshape(n, nl, nc, 1); return ax, imageList;

#<<<<<<<<<<<<<<<<<< main <<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<model=load_model("treina1.h5");ax, imageList = leUmDir("membrane/test/test_ori");

results = model.predict(ax,verbose=1); #Entre 0 e 1results = results.reshape(30,128,128);results = 255*results;

#Parece que tem uma funcao np.clip que faz o mesmo mais eficientementeresults[results<0] = 0; results[results>255] = 255;ay=results.astype(np.uint8);

for i in range(len(imageList)): st=os.path.join("membrane/test/test_predict_hae", imageList[i]); cv2.imwrite(st,ay[i]);

18

Page 19: Segmentação semântica Segmentação de elipses e retângulos ... · cv2.imwrite(argv[3],QP) else: qp=255.0*((qp/2.0)+0.5) # Entre 0 e 255 qp=np.clip(qp,0,255) QP=np.uint8(qp) cv2.imwrite(argv[3],QP)

Imagens processadas 20, 21, 25 e 26.

19

Page 20: Segmentação semântica Segmentação de elipses e retângulos ... · cv2.imwrite(argv[3],QP) else: qp=255.0*((qp/2.0)+0.5) # Entre 0 e 255 qp=np.clip(qp,0,255) QP=np.uint8(qp) cv2.imwrite(argv[3],QP)

Usando modelo pré-treinado para categorizar imagens.Resnet50, InceptionV3 ou InceptionResNetV2.

Model Size Top-1 Accuracy Top-5 Accuracy Parameters Depth

Xception 88 MB 0.790 0.945 22,910,480 126

VGG16 528 MB 0.713 0.901 138,357,544 23

VGG19 549 MB 0.713 0.900 143,667,240 26

ResNet50 98 MB 0.749 0.921 25,636,712 -

InceptionV3 92 MB 0.779 0.937 23,851,784 159

InceptionResNetV2 215 MB 0.803 0.953 55,873,736 572

MobileNet 16 MB 0.704 0.895 4,253,864 88

MobileNetV2 14 MB 0.713 0.901 3,538,984 88

DenseNet121 33 MB 0.750 0.923 8,062,504 121

DenseNet169 57 MB 0.762 0.932 14,307,880 169

DenseNet201 80 MB 0.773 0.936 20,242,984 201

NASNetMobile 23 MB 0.744 0.919 5,326,716 -

NASNetLarge 343 MB 0.825 0.960 88,949,818 -

Lista de 1000 categorias do ImageNet:https://gist.github.com/yrevar/942d3a0ac09ec9e5eb3a

Nota: Algumas das redes listadas no manual de Keras não estão disponíveis. Para listar as redesrealmente disponíveis, execute em python:

import keras.applications as appprint(dir(app))

Resulta:['DenseNet121', 'DenseNet169', 'DenseNet201', 'InceptionResNetV2', 'InceptionV3', 'MobileNet','MobileNetV2', 'NASNetLarge', 'NASNetMobile', 'ResNet50', 'VGG16', 'VGG19', 'Xception','__builtins__', '__cached__', '__doc__', '__file__', '__loader__', '__name__', '__package__','__path__', '__spec__', 'absolute_import', 'backend', 'densenet', 'division','inception_resnet_v2', 'inception_v3', 'keras_applications', 'keras_modules_injection','layers', 'mobilenet', 'mobilenet_v2', 'models', 'nasnet', 'print_function', 'resnet50','utils', 'vgg16', 'vgg19', 'xception']

#~/deep/keras/pre-treinado/classif1.py#Copiado do manual do Kerasfrom __future__ import print_functionfrom keras.preprocessing import imageimport numpy as npimport sysfrom sys import argv

if (len(argv)!=2): print("classif1.py nomeimg.ext"); sys.exit();

#from keras.applications.resnet50 import ResNet50, preprocess_input, decode_predictions#model = ResNet50(weights='imagenet')#target_size = (224, 224)

from keras.applications.inception_v3 import InceptionV3, preprocess_input, decode_predictionsmodel = InceptionV3(weights='imagenet')target_size = (299, 299)

#from keras.applications.inception_resnet_v2 import InceptionResNetV2, preprocess_input, decode_pre-dictions#model = InceptionResNetV2(weights='imagenet')#target_size = (299, 299)

img_path = argv[1];img = image.load_img(img_path, target_size=target_size)x = image.img_to_array(img)x = np.expand_dims(x, axis=0)x = preprocess_input(x)

20

Page 21: Segmentação semântica Segmentação de elipses e retângulos ... · cv2.imwrite(argv[3],QP) else: qp=255.0*((qp/2.0)+0.5) # Entre 0 e 255 qp=np.clip(qp,0,255) QP=np.uint8(qp) cv2.imwrite(argv[3],QP)

preds = model.predict(x)p=decode_predictions(preds, top=3)[0]# decode the results into a list of tuples (class, description, probability)# (one such list for each sample in the batch)#print('Predicted:', p)# Predicted: [(u'n02504013', u'Indian_elephant', 0.82658225), (u'n01871265', u'tusker', 0.1122357), (u'n02504458', u'African_elephant', 0.061040461)]

for i in range(len(p)): print("%8.2f%% %s"%(100*p[i][2],p[i][1]))

$python classif1.py chimpanzee.jpg 2>/dev/null

88.07% chimpanzee 2.66% patas 1.89% guenon

84.94% wood_rabbit 9.12% hare 0.74% Angora

36.37% fountain 24.10% stupa 17.58% palace

97.17% liner 0.06% dock 0.05% fireboat

66.03% trolleybus 23.75% passenger_car 2.83% minibus 79.72% orangutan

0.88% chimpanzee 0.77% patas

92.53% aircraft_carrier 1.22% shower_cap 0.06% grey_fox

73.06% tiger 21.46% tiger_cat 0.11% zebra

98.37% bullet_train 0.03% Arabian_camel 0.02% passenger_car

(PSI3472-2019 Aula 7 exercício 3) Modifique o programa classif1.py para usar o modelo Xception.Teste o seu programa para algumas imagens (pode ser as de cima ou o que pegar na internet).

21

Page 22: Segmentação semântica Segmentação de elipses e retângulos ... · cv2.imwrite(argv[3],QP) else: qp=255.0*((qp/2.0)+0.5) # Entre 0 e 255 qp=np.clip(qp,0,255) QP=np.uint8(qp) cv2.imwrite(argv[3],QP)

Classificação e localização de objeto:

Gato

Digamos que temos rede neural que reconhece cão, gato e pássaro: Rede tem 3 saídas - cão, gato epássaro.

1) Treinar rede com mais 4 saídas: xcentro, ycentro, largura, altura.

2) Janela móvel. Tem que treinar com 4 categorias: cão, gato, pássaro e "não cão, gato ou pássaro".Lento.

Tem jeito melhor?

3) R-CNN.Region proposal: Propõe regiões onde podem ter objetos.Classifier: Verifica se nessa região tem algum objeto.

22

Page 23: Segmentação semântica Segmentação de elipses e retângulos ... · cv2.imwrite(argv[3],QP) else: qp=255.0*((qp/2.0)+0.5) # Entre 0 e 255 qp=np.clip(qp,0,255) QP=np.uint8(qp) cv2.imwrite(argv[3],QP)

Rede completamente convolucionalComo modificar rede neural convolucional para localizar um objeto sem rodar janela móvel?

1) Rodar rede neural convolucional em janela móvel. Problema: É lento. 2) Dá para fazer a mesma coisa que rodar rede neural em janelas, eliminando as camadas densas(ficam só camadas convolucionais).

LeNet original que aceita imagem 28x28:1x28x28 → conv5x5 → 20x24x24 → maxpool → 20x12x12 → conv5x5 → 40x8x8 → maxpool→ 40x4x4 → flatten → dense → 500 → dense → 100 → dense → 10A saída possui 10 elementos, referentes a 10 categorias. Não se pode entrar imagens maiores que28x28 nesta rede.

Trocando camadas densas (fc) por convolucionais:1x28x28 → conv5x5 → 20x24x24 → maxpool → 20x12x12 → conv5x5 → 40x8x8 → maxpool→ 40x4x4 → conv4x4 → 500x1x1 → conv1x1 → 100x1x1 → conv1x1 → 10x1x1 Esta camada faz exatamente a mesma coisa que LeNet original. A saída possui 10 imagens 1x1referentes a 10 categorias. (qx0.png)

[[7]]

O que acontece se entrar uma imagem maior que 28x28? Por exemplo, 32x32? 1x32x32 → conv5x5 → 20x28x28 → maxpool → 20x14x14 → conv5x5 → 40x10x10 →maxpool → 40x5x5 → conv4x4 → 500x2x2 → conv1x1 → 100x2x2 → conv1x1 → 10x2x2 A saída será um tensor 10x4x4. Isto equivale a aplicar LeNet original 4 vezes, fazendo stride de 4.

(qx0b.png)

[[ 7 7] [ 7 -1]]

23

Page 24: Segmentação semântica Segmentação de elipses e retângulos ... · cv2.imwrite(argv[3],QP) else: qp=255.0*((qp/2.0)+0.5) # Entre 0 e 255 qp=np.clip(qp,0,255) QP=np.uint8(qp) cv2.imwrite(argv[3],QP)

O que acontece se entrar uma imagem 80x80? 1x80x80 → conv5x5 → 20x76x76 → maxpool → 20x38x38 → conv5x5 → 40x32x32 →maxpool → 40x16x16 → conv4x4 → 500x12x12 → conv1x1 → 100x12x12 → conv1x1 →10x12x12

(q2.png)

[[-1 7 7 -1 -1 -1 -1 -1 -1 5 7 -1 2 2] [ 7 7 7 7 7 -1 -1 -1 -1 -1 2 2 2 -1] [-1 7 -1 -1 -1 -1 -1 -1 -1 2 -1 -1 -1 -1] [-1 1 4 -1 -1 -1 -1 -1 -1 -1 -1 4 4 4] [-1 -1 -1 -1 -1 -1 -1 -1 -1 -1 5 5 5 7] [-1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 7 7] [-1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1] [-1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1] [-1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1] [-1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 2 2] [-1 -1 -1 -1 -1 -1 -1 -1 -1 -1 7 -1 2 -1] [ 7 1 1 -1 -1 -1 -1 -1 -1 -1 -1 6 -1 -1] [-1 1 -1 -1 -1 -1 -1 -1 -1 4 1 0 0 -1] [ 1 1 -1 -1 -1 -1 -1 -1 -1 4 -1 -1 -1 9]]

Teria que treinar com categoria "não-dígito".

24

Page 25: Segmentação semântica Segmentação de elipses e retângulos ... · cv2.imwrite(argv[3],QP) else: qp=255.0*((qp/2.0)+0.5) # Entre 0 e 255 qp=np.clip(qp,0,255) QP=np.uint8(qp) cv2.imwrite(argv[3],QP)

#detect1.py - treinoimport kerasfrom keras.datasets import mnistfrom keras.models import Sequential, load_modelfrom keras.layers import Dropout, Conv2D, MaxPooling2D, Dense, Flattenfrom keras import optimizersimport numpy as npimport cv2;

batch_size = 100num_classes = 10epochs = 30

# input image dimensionsnl, nc = 28, 28

# the data, split between train and test sets(ax, ay), (qx, qy) = mnist.load_data()

#print('channels_last - formato de TensorFlow');ax = ax.reshape(ax.shape[0], nl, nc, 1)qx = qx.reshape(qx.shape[0], nl, nc, 1)input_shape = (nl, nc, 1)

ax = ax.astype('float32')qx = qx.astype('float32')ax /= 255 #0 a 1qx /= 255 #0 a 1print(ax.shape[0], 'train samples')print(qx.shape[0], 'test samples')

# convert class vectors to binary class matricesay = keras.utils.to_categorical(ay, num_classes)ay = ay.reshape(ay.shape[0],1,1,num_classes);qy = keras.utils.to_categorical(qy, num_classes)qy = qy.reshape(qy.shape[0],1,1,num_classes);

#Execute se nao houver arquivo detect1.h5model = Sequential(); #Entrada: 28*28*1model.add(Conv2D(20, kernel_size=(5,5), activation='relu', input_shape=input_shape)); #Saida0: 20*24*24model.add(MaxPooling2D(pool_size=(2,2))); #Saida1: 20*12*12model.add(Conv2D(40, kernel_size=(5,5), activation='relu')); #Saida2: 40*8*8model.add(MaxPooling2D(pool_size=(2,2))); #Saida3: 40*4*4##LeNet original#model.add(Flatten()); #Saida4: 640#model.add(Dense(500, activation='relu')); #Saida5: 500#model.add(Dense(100, activation='relu')); #Saida6: 100#model.add(Dense(num_classes, activation='softmax')); #Saida7: 10##Trocando camadas densas por convolucionaismodel.add(Conv2D(500, kernel_size=(4,4), activation='relu')) #Saida5: 500*1*1model.add(Conv2D(100, kernel_size=(1,1), activation='relu')) #Saida6: 100*1*1model.add(Conv2D(num_classes, kernel_size=(1,1), activation='softmax')) #Saida7: 10*1*1"""#Execute se houver arquivo detect1.h5model=load_model('detect1.h5')"""

#from keras.utils import plot_model#plot_model(model, to_file='detect1.png', show_shapes=True)#from keras.utils import print_summary#print_summary(model)

opt=optimizers.Adam(amsgrad=True)model.compile(optimizer=opt, loss='categorical_crossentropy', metrics=['accuracy'])

model.fit(ax, ay, batch_size=batch_size, epochs=epochs, verbose=1, validation_data=(qx, qy))

score = model.evaluate(qx, qy, verbose=0)print('Test loss:', score[0])print('Test accuracy:', score[1])

model.save('detect1.h5')

Epoch 30 - acc: 1.0000 - val_acc: 0.9936

25

Page 26: Segmentação semântica Segmentação de elipses e retângulos ... · cv2.imwrite(argv[3],QP) else: qp=255.0*((qp/2.0)+0.5) # Entre 0 e 255 qp=np.clip(qp,0,255) QP=np.uint8(qp) cv2.imwrite(argv[3],QP)

#localize.pyimport kerasfrom keras.datasets import mnistfrom keras.models import Sequential, load_model;from keras.layers import Dropout, Conv2D, MaxPooling2D, Dense, Flattenfrom keras import optimizersimport numpy as npimport cv2;import sys;

if len(sys.argv)!=2: print("localize.py ent.png"); sys.exit()

num_classes = 10

qx0=cv2.imread(sys.argv[1],0);nl, nc = qx0.shape[0], qx0.shape[1]input_shape = (nl, nc, 1)

model = Sequential(); #Entrada: 28*28*1model.add(Conv2D(20, kernel_size=(5,5), activation='relu', input_shape=input_shape)); #Saida0: 20*24*24model.add(MaxPooling2D(pool_size=(2,2))); #Saida1: 20*12*12model.add(Conv2D(40, kernel_size=(5,5), activation='relu')); #Saida2: 40*8*8model.add(MaxPooling2D(pool_size=(2,2))); #Saida3: 40*4*4model.add(Conv2D(500, kernel_size=(4,4), activation='relu')) #Saida5: 500*1*1model.add(Conv2D(100, kernel_size=(1,1), activation='relu')) #Saida6: 100*1*1model.add(Conv2D(num_classes, kernel_size=(1,1), activation='softmax')) #Saida7: 10*1*1model.load_weights("detect1.h5");

qx0 = qx0.astype("float32");qx0 /= 255 #0 a 1qx0=qx0.reshape(1,nl,nc,1);qp = model.predict(qx0);print(qp.shape)

snl, snc = qp.shape[1], qp.shape[2];qp = qp.reshape(snl,snc,10);sai=np.empty([snl,snc],dtype=np.int8);for l in range(snl): for c in range(snc): i=np.argmax(qp[l,c,:]) if qp[l,c,i]>0.999: #print(l,c,i,qp[l,c,i]); sai[l,c]=i; else: sai[l,c]=-1;print(sai);#print(qp);

(PSI3472-2019 Aula 7 exercício 4) Modifique o programa caogato1.py (aula5, exercício 3, daapostila cifar-reduzido) para obter programa que aceita imagens coloridas maiores que 32x32. Testeesse programa na imagem caogato.png que possui dimensão 80x80.

caogato.png

26

Page 27: Segmentação semântica Segmentação de elipses e retângulos ... · cv2.imwrite(argv[3],QP) else: qp=255.0*((qp/2.0)+0.5) # Entre 0 e 255 qp=np.clip(qp,0,255) QP=np.uint8(qp) cv2.imwrite(argv[3],QP)

Cekpy:

Vamos deixar algumas funções que são usadas repetidamente em cekpy.py. Deixe o arquivo abaixono mesmo diretório do seu programa .py ou num diretório listado pelo variável ambientePYTHONPATH.

#cekpy.pyimport cv2;import sys;import os;import numpy as np;from matplotlib import pyplot as plt;

def erro(st): print(st); sys.exit(0);

def mostra(a): #print(a.dtype); #print(a.shape); if a.dtype=="uint8" and len(a.shape)==3 and a.shape[2]==3: #COR t=cv2.cvtColor(a,cv2.COLOR_BGR2RGB); plt.imshow(t,interpolation="bicubic") plt.show() elif a.dtype=="uint8" and len(a.shape)==2: #GRY plt.imshow(a,cmap="gray",interpolation="bicubic"); plt.show() elif (a.dtype=="float32" or a.dtype=="float64") and len(a.shape)==3 and a.shape[2]==3: #CORF t=cv2.cvtColor(a,cv2.COLOR_BGR2RGB); plt.imshow(t,interpolation="bicubic") plt.show() elif (a.dtype=="float32" or a.dtype=="float64") and len(a.shape)==2: #FLT plt.imshow(a,cmap="gray",interpolation="bicubic"); plt.show() else: print("Tipo de imagem desconhecida"); sys.exit(0);

def leCsv(nomeDir, nomeArq, nl=0, nc=0): #nomeDir = Diretorio onde estao treino.csv, teste.csv e imagens nnna.jpg e nnnb.jpg. #Ex: nomeDir = "/home/hae/haebase/fei/feiFrontCor" st=os.path.join(nomeDir,nomeArq) arq=open(st,"rt") lines=arq.readlines(); arq.close(); n=len(lines)

linhas_separadas=[] for linha in lines: linha=linha.strip('\n'); linha=linha.split(';'); linhas_separadas.append(linha);

t=cv2.imread(os.path.join(nomeDir,linhas_separadas[0][0]),1); onl=t.shape[0]; onc=t.shape[1]; if nl==0 or nc==0: nl=onl; nc=onc;

ay=np.empty((n),dtype='float32'); ax=np.empty((n,nl,nc,3),dtype='float32');

for i in range(len(linhas_separadas)): linha=linhas_separadas[i]; t=cv2.imread(os.path.join(nomeDir,linha[0]),1); if nl>0 and nc>0: t=cv2.resize(t,(nc,nl),interpolation=cv2.INTER_AREA); ax[i]=np.float32(t)/255.0; #Entre 0 e 1 ay[i]=np.float32(linha[1]); #0=m ou 1=f

return ax, ay;

27

Page 28: Segmentação semântica Segmentação de elipses e retângulos ... · cv2.imwrite(argv[3],QP) else: qp=255.0*((qp/2.0)+0.5) # Entre 0 e 255 qp=np.clip(qp,0,255) QP=np.uint8(qp) cv2.imwrite(argv[3],QP)

Transfer learning

O site abaixo contém vários bancos de dados de faces humanas:http://fei.edu.br/~cet/facedatabase.html

Desse site, peguei o banco de imagens com faces frontais de 200 pessoas (400 imagens coloridas com360x260 pixels) de rostos frontais com expressão neutra (*a.jpg) e sorridente (*b.jpg), alinhadas manual-mente. Dessas imagens, metade são rostos masculinos e metade são femininos. Recortei as bordas dessasimagens, para que fiquem com 280x200 pixels. As imagens assim obtidas estão em:

http://www.lps.usp.br/hae/apostila/feiCorCrop.zipNesse ZIP, também há 3 arquivos csv (comma separated values): treino.csv, valida.csv e teste.csv, com listade arquivos nome de arquivo com classificação (0=masculino e 1=feminino). Algumas imagens:

095a.jpg 095b.jpg 096a.jpg 096b.jpg

O programa abaixo utiliza rede tipo "LeNet" (sem data augmentation) para classificar imagens em masculinoe feminino. A taxa de acerto teste é 92%. A taxa de acerto de treino de 100% indica que há overfitting e astaxas de acerto de validação/teste aumentariam se fizesse data augmentation.

#mf4.py#Training loss: [0.0018767415732145309, 1.0]#Validation loss: [0.07837992310523986, 0.91]#Test loss: [0.043139079213142396, 0.92]

import cekpy as cek;import cv2;import numpy as np;import tensorflow.keras as keras;import keras.backend as K;from keras import optimizers, callbacks, regularizers;from keras.regularizers import l2;from keras.models import Sequential;from keras.layers import Dropout, Conv2D, MaxPooling2D, Dense, Flatten;from inspect import currentframe, getframeinfoimport os

#mainfi = getframeinfo(currentframe());nomeprog=os.path.splitext(fi.filename)[0];

#Original: 280x200, redimensionado: 112x80nl=112nc=80diretorioBd="/home/hae/haebase/fei/feiCorCrop"ax, ay = cek.leCsv(diretorioBd,"treino.csv", nl=nl, nc=nc); #200 imagensqx, qy = cek.leCsv(diretorioBd,"teste.csv", nl=nl, nc=nc); #100 imagensvx, vy = cek.leCsv(diretorioBd,"valida.csv", nl=nl, nc=nc); #100 imagens

#for a in ax: #cek.mostra(a);input_shape = (nl,nc,3);batch_size = 10;epochs =30;

model = Sequential();model.add(Conv2D(30, kernel_size=(5,5), activation='relu', input_shape=input_shape))

28

Page 29: Segmentação semântica Segmentação de elipses e retângulos ... · cv2.imwrite(argv[3],QP) else: qp=255.0*((qp/2.0)+0.5) # Entre 0 e 255 qp=np.clip(qp,0,255) QP=np.uint8(qp) cv2.imwrite(argv[3],QP)

model.add(MaxPooling2D(pool_size=(2,2)))model.add(Conv2D(40, kernel_size=(5,5), activation='relu'))model.add(MaxPooling2D(pool_size=(2,2)))model.add(Conv2D(50, kernel_size=(5,5), activation='relu'))model.add(MaxPooling2D(pool_size=(2,2)))model.add(Conv2D(60, kernel_size=(5,5), activation='relu'))model.add(MaxPooling2D(pool_size=(2,2)))model.add(Flatten())model.add(Dense(1000, activation='relu'))model.add(Dense(1, activation='linear'))

#from keras.utils import plot_model#plot_model(model, to_file='ep2g.png', show_shapes=True)#from keras.utils import print_summary#print_summary(model)

opt=optimizers.Adam();

model.compile(optimizer=opt, loss='mse', metrics=['accuracy'])

model.fit(ax, ay, batch_size=batch_size, epochs=epochs, verbose=True, validation_data=(vx,vy))

score = model.evaluate(ax, ay, verbose=0)print('Training loss:', score)score = model.evaluate(vx, vy, verbose=0)print('Validation loss:', score)score = model.evaluate(qx, qy, verbose=0)print('Test loss:', score)

model.save(nomeprog+'.h5')

(PSI3472-2019 Aula 8 exercício 1) Acrescente data augmentation no programa acima para ver ataxa de acerto que se consegue atingir.

29

Page 30: Segmentação semântica Segmentação de elipses e retângulos ... · cv2.imwrite(argv[3],QP) else: qp=255.0*((qp/2.0)+0.5) # Entre 0 e 255 qp=np.clip(qp,0,255) QP=np.uint8(qp) cv2.imwrite(argv[3],QP)

Vamos fazer transfer learning, usando VGG16:As taxas de acerto de validação/teste foram para 93/98% (treinando só a camada densa superior) e para93%/100% treinando todas as camadas com learning rate pequeno.

Após treinar camada superior:Epoch 10/10 - 2s 9ms/step - loss: 0.0061 - acc: 1.0000 - val_loss: 0.1070 - val_acc: 0.9300Training loss: [0.005465924926102162, 1.0]Validation loss: [0.10703720465302467, 0.93]Test loss: [0.0822840192914009, 0.98]

Após treinar todas as camadas:Epoch 50/50 - 4s 22ms/step - loss: 1.1634e-05 - acc: 1.0000 - val_loss: 0.1056 - val_acc: 0.9500Training loss: [1.1298586396151222e-05, 1.0]Validation loss: [0.1056408049726042, 0.95]Test loss: [0.1299311416657656, 0.98]

#vgg2b.py#Faz transfer learning usando VGG16 para melhorar acuracidade.#No fim, faz aprendizagem descongelando todas camadas#Usando exemplo#https://medium.com/abraia/first-steps-with-transfer-learning-for-custom-image-classification-with-keras-b941601fcad5

import cekpy as cek;import cv2;import numpy as np;import kerasimport keras.backend as K;from keras import optimizers, callbacks, regularizers;from keras.regularizers import l2;from keras.models import Sequential, Model;from keras.layers import Dropout, Conv2D, MaxPooling2D, Dense, Flatten;from keras.applications.resnet50 import ResNet50from inspect import currentframe, getframeinfoimport osfrom keras.applications.vgg16 import VGG16

#main#Desliga avisos de Tensorflowimport os; os.environ['TF_CPP_MIN_LOG_LEVEL']='3'#Imprime nome do GPUimport tensorflow as tf; tf.test.gpu_device_name()#Pega nome do programa em execucaofrom inspect import currentframe, getframeinfofi = getframeinfo(currentframe()); nomeprog=os.path.splitext(fi.filename)[0];

#Original: 280x200, redimensionado: 112x80num_classes=2nl=112; nc=80#nl=140; nc=100diretorioBd="/home/hae/haebase/fei/feiCorCrop"ax, ay = cek.leCsv(diretorioBd,"treino.csv", nl=nl, nc=nc); #200 imagensqx, qy = cek.leCsv(diretorioBd,"teste.csv", nl=nl, nc=nc); #100 imagensvx, vy = cek.leCsv(diretorioBd,"valida.csv", nl=nl, nc=nc); #100 imagensay = keras.utils.to_categorical(ay, num_classes)qy = keras.utils.to_categorical(qy, num_classes)vy = keras.utils.to_categorical(vy, num_classes)

#for a in ax: #cek.mostra(a);input_shape = (nl,nc,3);batch_size = 10;epochs =10;

base_model = VGG16(weights='imagenet', include_top=False, input_shape=input_shape)x = base_model.outputx = Flatten()(x)x = Dense(100, activation="relu")(x)predictions = Dense(num_classes, activation="softmax")(x)model = Model(inputs=base_model.input, outputs=predictions)

from keras.utils import plot_modelplot_model(model, to_file=nomeprog+'.png', show_shapes=True)from keras.utils import print_summaryprint_summary(model)

for layer in base_model.layers: layer.trainable = False

otimizador=keras.optimizers.Adam(lr=1e-3)model.compile(otimizador, loss='categorical_crossentropy', metrics =['accuracy'])

model.fit(ax, ay, batch_size=batch_size, epochs=epochs, verbose=True, validation_data=(vx,vy))

score = model.evaluate(ax, ay, verbose=0)

30

Page 31: Segmentação semântica Segmentação de elipses e retângulos ... · cv2.imwrite(argv[3],QP) else: qp=255.0*((qp/2.0)+0.5) # Entre 0 e 255 qp=np.clip(qp,0,255) QP=np.uint8(qp) cv2.imwrite(argv[3],QP)

print('Training loss:', score)score = model.evaluate(vx, vy, verbose=0)print('Validation loss:', score)score = model.evaluate(qx, qy, verbose=0)print('Test loss:', score)

#Libera todos layers:for layer in model.layers: layer.trainable = True

otimizador=keras.optimizers.Adam(lr=1e-6)model.compile(otimizador, loss='categorical_crossentropy', metrics =['accuracy'])

epochs =50;model.fit(ax, ay, batch_size=batch_size, epochs=epochs, verbose=True, validation_data=(vx,vy))

score = model.evaluate(ax, ay, verbose=0)print('Training loss:', score)score = model.evaluate(vx, vy, verbose=0)print('Validation loss:', score)score = model.evaluate(qx, qy, verbose=0)print('Test loss:', score)

model.save(nomeprog+".h5")

31

Page 32: Segmentação semântica Segmentação de elipses e retângulos ... · cv2.imwrite(argv[3],QP) else: qp=255.0*((qp/2.0)+0.5) # Entre 0 e 255 qp=np.clip(qp,0,255) QP=np.uint8(qp) cv2.imwrite(argv[3],QP)

Usando data augmentation, obtive: Erros de validação/teste mudaram para 94/100%.

Epoch 15/15 - 2s 81ms/step - loss: 0.0564 - acc: 0.9900 - val_loss: 0.1296 - val_acc: 0.9300Training loss: [0.031960930526256565, 0.995]Validation loss: [0.12963369432836772, 0.93]Test loss: [0.05153264611959457, 0.98]...Epoch 15/15 - 4s 218ms/step - loss: 0.0109 - acc: 1.0000 - val_loss: 0.1579 - val_acc: 0.9400Training loss: [0.008660740279592573, 1.0]Validation loss: [0.15794570177793502, 0.94]Test loss: [0.019271421525627375, 1.0]

#vgg3b.py#Usa data augmentation

import cekpy as cek;import cv2;import numpy as np;import keras;import keras.backend as K;from keras import optimizers, callbacks, regularizers;from keras.regularizers import l2;from keras.models import Sequential, Model;from keras.layers import Dropout, Conv2D, MaxPooling2D, Dense, Flatten;from keras.applications.resnet50 import ResNet50from inspect import currentframe, getframeinfoimport osfrom keras.applications.vgg16 import VGG16from keras.preprocessing.image import ImageDataGeneratorfrom keras.callbacks import ReduceLROnPlateau

#main#Desliga avisos de Tensorflowimport os; os.environ['TF_CPP_MIN_LOG_LEVEL']='3'#Imprime nome do GPUimport tensorflow as tf; tf.test.gpu_device_name()#Pega nome do programa em execucaofrom inspect import currentframe, getframeinfofi = getframeinfo(currentframe()); nomeprog=os.path.splitext(fi.filename)[0];

#Original: 280x200, redimensionado: 112x80num_classes=2nl=112; nc=80#nl=140; nc=100diretorioBd="/home/hae/haebase/fei/feiCorCrop"ax, ay = cek.leCsv(diretorioBd,"treino.csv", nl=nl, nc=nc); #200 imagensqx, qy = cek.leCsv(diretorioBd,"teste.csv", nl=nl, nc=nc); #100 imagensvx, vy = cek.leCsv(diretorioBd,"valida.csv", nl=nl, nc=nc); #100 imagensay = keras.utils.to_categorical(ay, num_classes)qy = keras.utils.to_categorical(qy, num_classes)vy = keras.utils.to_categorical(vy, num_classes)

#for a in ax: #cek.mostra(a);input_shape = (nl,nc,3);batch_size = 10;

base_model = VGG16(weights='imagenet', include_top=False, input_shape=input_shape)x = base_model.outputx = Flatten()(x)x = Dense(100, activation="relu")(x)predictions = Dense(num_classes, activation="softmax")(x)model = Model(inputs=base_model.input, outputs=predictions)

from keras.utils import plot_modelplot_model(model, to_file=nomeprog+'.png', show_shapes=True)from keras.utils import print_summaryprint_summary(model)

for layer in base_model.layers: layer.trainable = False

print('Using real-time data augmentation.')datagen = ImageDataGenerator( # randomly shift images horizontally - Fraction of width width_shift_range=0.1, # randomly shift images vertically - Fraction of height height_shift_range=0.1, # set range for random shear fill_mode='nearest', # value used for fill_mode = "constant" horizontal_flip=True)

# Compute quantities required for featurewise normalization# (std, mean, and principal components if ZCA whitening is applied).otimizador=keras.optimizers.Adam(lr=1e-3)model.compile(otimizador, loss='categorical_crossentropy', metrics =['accuracy'])datagen.fit(ax)epochs =15;

32

Page 33: Segmentação semântica Segmentação de elipses e retângulos ... · cv2.imwrite(argv[3],QP) else: qp=255.0*((qp/2.0)+0.5) # Entre 0 e 255 qp=np.clip(qp,0,255) QP=np.uint8(qp) cv2.imwrite(argv[3],QP)

model.fit_generator(datagen.flow(ax, ay, batch_size=batch_size), epochs=epochs, verbose=True, validation_data=(vx, vy), steps_per_epoch=ax.shape[0]//batch_size)

score = model.evaluate(ax, ay, verbose=0)print('Training loss:', score)score = model.evaluate(vx, vy, verbose=0)print('Validation loss:', score)score = model.evaluate(qx, qy, verbose=0)print('Test loss:', score)

for layer in model.layers: layer.trainable = True

otimizador=keras.optimizers.Adam(lr=1e-6)model.compile(otimizador, loss='categorical_crossentropy', metrics =['accuracy'])datagen.fit(ax)epochs =15;model.fit_generator(datagen.flow(ax, ay, batch_size=batch_size), epochs=epochs, verbose=True, validation_data=(vx, vy), steps_per_epoch=ax.shape[0]//batch_size)

score = model.evaluate(ax, ay, verbose=0)print('Training loss:', score)score = model.evaluate(vx, vy, verbose=0)print('Validation loss:', score)score = model.evaluate(qx, qy, verbose=0)print('Test loss:', score)

model.save(nomeprog+".h5")

(PSI3472-2019 Aula 8 exercício 2) Imprima (em arquivos) as imagens de pessoas que estão sendoclassificadas erradamente.

33

Page 34: Segmentação semântica Segmentação de elipses e retângulos ... · cv2.imwrite(argv[3],QP) else: qp=255.0*((qp/2.0)+0.5) # Entre 0 e 255 qp=np.clip(qp,0,255) QP=np.uint8(qp) cv2.imwrite(argv[3],QP)

Nota: Não consegui fazer transfer learning funcionar com ResNet e Inception. A taxa de acerto fica muitobaixa. Não sei por quê. Talvez por que tem erro na camada batch normalization do Keras?

34

Page 35: Segmentação semântica Segmentação de elipses e retângulos ... · cv2.imwrite(argv[3],QP) else: qp=255.0*((qp/2.0)+0.5) # Entre 0 e 255 qp=np.clip(qp,0,255) QP=np.uint8(qp) cv2.imwrite(argv[3],QP)

YOLO object detection (You Only Look Once)

Site original do YOLO (escrito em C):https://pjreddie.com/darknet/yolo/

Como converter para Keras:https://machinelearningmastery.com/how-to-perform-object-detection-with-yolov3-in-keras/https://github.com/experiencor/keras-yolo3

Pesos convertidos em:http://www.lps.usp.br/hae/apostila/keras-yolov3.zip

# detect.py# load yolov3 model and perform object detection# based on https://github.com/experiencor/keras-yolo3import numpy as npfrom numpy import expand_dimsfrom keras.models import load_modelfrom keras.preprocessing.image import load_imgfrom keras.preprocessing.image import img_to_arrayfrom matplotlib import pyplotfrom matplotlib.patches import Rectangleimport sys

class BoundBox:def __init__(self, xmin, ymin, xmax, ymax, objness = None, classes = None):

self.xmin = xminself.ymin = yminself.xmax = xmaxself.ymax = ymaxself.objness = objnessself.classes = classesself.label = -1self.score = -1

def get_label(self):if self.label == -1:

self.label = np.argmax(self.classes)

return self.label

def get_score(self):if self.score == -1:

self.score = self.classes[self.get_label()]

return self.score

def _sigmoid(x):return 1. / (1. + np.exp(-x))

def decode_netout(netout, anchors, obj_thresh, net_h, net_w):grid_h, grid_w = netout.shape[:2]nb_box = 3netout = netout.reshape((grid_h, grid_w, nb_box, -1))nb_class = netout.shape[-1] - 5boxes = []netout[..., :2] = _sigmoid(netout[..., :2])netout[..., 4:] = _sigmoid(netout[..., 4:])netout[..., 5:] = netout[..., 4][..., np.newaxis] * netout[..., 5:]netout[..., 5:] *= netout[..., 5:] > obj_thresh

for i in range(grid_h*grid_w):row = i / grid_wcol = i % grid_wfor b in range(nb_box):

# 4th element is objectness scoreobjectness = netout[int(row)][int(col)][b][4]if(objectness.all() <= obj_thresh): continue# first 4 elements are x, y, w, and hx, y, w, h = netout[int(row)][int(col)][b][:4]x = (col + x) / grid_w # center position, unit: image widthy = (row + y) / grid_h # center position, unit: image heightw = anchors[2 * b + 0] * np.exp(w) / net_w # unit: image widthh = anchors[2 * b + 1] * np.exp(h) / net_h # unit: image height# last elements are class probabilitiesclasses = netout[int(row)][col][b][5:]box = BoundBox(x-w/2, y-h/2, x+w/2, y+h/2, objectness, classes)boxes.append(box)

return boxes

def correct_yolo_boxes(boxes, image_h, image_w, net_h, net_w):new_w, new_h = net_w, net_hfor i in range(len(boxes)):

x_offset, x_scale = (net_w - new_w)/2./net_w, float(new_w)/net_wy_offset, y_scale = (net_h - new_h)/2./net_h, float(new_h)/net_hboxes[i].xmin = int((boxes[i].xmin - x_offset) / x_scale * image_w)

35

Page 36: Segmentação semântica Segmentação de elipses e retângulos ... · cv2.imwrite(argv[3],QP) else: qp=255.0*((qp/2.0)+0.5) # Entre 0 e 255 qp=np.clip(qp,0,255) QP=np.uint8(qp) cv2.imwrite(argv[3],QP)

boxes[i].xmax = int((boxes[i].xmax - x_offset) / x_scale * image_w)boxes[i].ymin = int((boxes[i].ymin - y_offset) / y_scale * image_h)boxes[i].ymax = int((boxes[i].ymax - y_offset) / y_scale * image_h)

def _interval_overlap(interval_a, interval_b):x1, x2 = interval_ax3, x4 = interval_bif x3 < x1:

if x4 < x1:return 0

else:return min(x2,x4) - x1

else:if x2 < x3:

return 0else:

return min(x2,x4) - x3

def bbox_iou(box1, box2):intersect_w = _interval_overlap([box1.xmin, box1.xmax], [box2.xmin, box2.xmax])intersect_h = _interval_overlap([box1.ymin, box1.ymax], [box2.ymin, box2.ymax])intersect = intersect_w * intersect_hw1, h1 = box1.xmax-box1.xmin, box1.ymax-box1.yminw2, h2 = box2.xmax-box2.xmin, box2.ymax-box2.yminunion = w1*h1 + w2*h2 - intersectreturn float(intersect) / union

def do_nms(boxes, nms_thresh):if len(boxes) > 0:

nb_class = len(boxes[0].classes)else:

returnfor c in range(nb_class):

sorted_indices = np.argsort([-box.classes[c] for box in boxes])for i in range(len(sorted_indices)):

index_i = sorted_indices[i]if boxes[index_i].classes[c] == 0: continuefor j in range(i+1, len(sorted_indices)):

index_j = sorted_indices[j]if bbox_iou(boxes[index_i], boxes[index_j]) >= nms_thresh:

boxes[index_j].classes[c] = 0

# load and prepare an imagedef load_image_pixels(filename, shape):

# load the image to get its shapeimage = load_img(filename)width, height = image.size# load the image with the required sizeimage = load_img(filename, target_size=shape)# convert to numpy arrayimage = img_to_array(image)# scale pixel values to [0, 1]image = image.astype('float32')image /= 255.0# add a dimension so that we have one sampleimage = expand_dims(image, 0)return image, width, height

# get all of the results above a thresholddef get_boxes(boxes, labels, thresh):

v_boxes, v_labels, v_scores = list(), list(), list()# enumerate all boxesfor box in boxes:

# enumerate all possible labelsfor i in range(len(labels)):

# check if the threshold for this label is high enoughif box.classes[i] > thresh:

v_boxes.append(box)v_labels.append(labels[i])v_scores.append(box.classes[i]*100)# don't break, many labels may trigger for one box

return v_boxes, v_labels, v_scores

# draw all resultsdef draw_boxes(filename, v_boxes, v_labels, v_scores):

# load the imagedata = pyplot.imread(filename)# plot the imagepyplot.imshow(data)# get the context for drawing boxesax = pyplot.gca()# plot each boxfor i in range(len(v_boxes)):

box = v_boxes[i]# get coordinatesy1, x1, y2, x2 = box.ymin, box.xmin, box.ymax, box.xmax# calculate width and height of the boxwidth, height = x2 - x1, y2 - y1# create the shaperect = Rectangle((x1, y1), width, height, fill=False, color='white')# draw the boxax.add_patch(rect)

36

Page 37: Segmentação semântica Segmentação de elipses e retângulos ... · cv2.imwrite(argv[3],QP) else: qp=255.0*((qp/2.0)+0.5) # Entre 0 e 255 qp=np.clip(qp,0,255) QP=np.uint8(qp) cv2.imwrite(argv[3],QP)

# draw text and score in top left cornerlabel = "%s (%.3f)" % (v_labels[i], v_scores[i])pyplot.text(x1, y1, label, color='white')

# show the plotpyplot.show()

if len(sys.argv)!=2: print("detect nomeimg.ext"); sys.exit();# load yolov3 modelmodel = load_model('yolov3.h5')# define the expected input shape for the modelinput_w, input_h = 416, 416 #320 ou 416 ou 608# define our new photophoto_filename = sys.argv[1]# load and prepare imageimage, image_w, image_h = load_image_pixels(photo_filename, (input_w, input_h))# make predictionyhat = model.predict(image)# summarize the shape of the list of arrays#print([a.shape for a in yhat])# define the anchorsanchors = [[116,90, 156,198, 373,326], [30,61, 62,45, 59,119], [10,13, 16,30, 33,23]]# define the probability threshold for detected objectsclass_threshold = 0.6boxes = list()for i in range(len(yhat)):

# decode the output of the networkboxes += decode_netout(yhat[i][0], anchors[i], class_threshold, input_h, input_w)

# correct the sizes of the bounding boxes for the shape of the imagecorrect_yolo_boxes(boxes, image_h, image_w, input_h, input_w)# suppress non-maximal boxesdo_nms(boxes, 0.5)# define the labels - o mesmo que coco.nameslabels = ["person", "bicycle", "car", "motorbike", "aeroplane", "bus", "train", "truck",

"boat", "traffic light", "fire hydrant", "stop sign", "parking meter", "bench","bird", "cat", "dog", "horse", "sheep", "cow", "elephant", "bear", "zebra", "giraffe","backpack", "umbrella", "handbag", "tie", "suitcase", "frisbee", "skis", "snowboard","sports ball", "kite", "baseball bat", "baseball glove", "skateboard", "surfboard","tennis racket", "bottle", "wine glass", "cup", "fork", "knife", "spoon", "bowl", "banana","apple", "sandwich", "orange", "broccoli", "carrot", "hot dog", "pizza", "donut", "cake","chair", "sofa", "pottedplant", "bed", "diningtable", "toilet", "tvmonitor", "laptop", "mouse","remote", "keyboard", "cell phone", "microwave", "oven", "toaster", "sink", "refrigerator","book", "clock", "vase", "scissors", "teddy bear", "hair drier", "toothbrush"]

# get the details of the detected objectsv_boxes, v_labels, v_scores = get_boxes(boxes, labels, class_threshold)# summarize what we foundfor i in range(len(v_boxes)):

print(v_labels[i], v_scores[i])# draw what we founddraw_boxes(photo_filename, v_boxes, v_labels, v_scores)

Note que o programa acima detecta somente 80 categorias de objetos listados nos labels.

(PSI3472-2019 Aula 8 exercício 3) Execute o programa acima em algumas imagens (podem ser osdo site ou da internet).

Façam:Exercício 3 - testar Yolo.Ou exercício 1 ou exercício 2

37

Page 38: Segmentação semântica Segmentação de elipses e retângulos ... · cv2.imwrite(argv[3],QP) else: qp=255.0*((qp/2.0)+0.5) # Entre 0 e 255 qp=np.clip(qp,0,255) QP=np.uint8(qp) cv2.imwrite(argv[3],QP)

Exemplos de detecções:

38

Page 39: Segmentação semântica Segmentação de elipses e retângulos ... · cv2.imwrite(argv[3],QP) else: qp=255.0*((qp/2.0)+0.5) # Entre 0 e 255 qp=np.clip(qp,0,255) QP=np.uint8(qp) cv2.imwrite(argv[3],QP)

Explicação simplificada de YOLO (há detalhes que não estou explicando):

1) Digamos que queremos detectar 3 classes de objetos: 1 = pedestre

2 = carro3 = motocicleta

A imagem é dividida em grides. No exemplo abaixo, em 3x3 grides.

2) Para cada gride, faz uma predição que retorna 8 números. Isto pode ser feita fazendo uma única predição.y = [ pc, bx, by, bh, bw, c1, c2, c3 ]

pc=0 se não há objeto. pc=1 se há objeto.bx, by, bh e bw são posições x, y, height e width do bounding box do objeto. O bounding box pode sair forado gride.c1, c2 e c3 se referem a 3 classes de objetos.

Os grides onde não há carro:y = [ 0, ?, ?, ?, ?, ?, ?, ?]

Os dois grides onde há carro:y = [ 1, bx, by, bh, bw, 0, 1, 0]

Faz-se supressão de não-máximo, para ficar com uma única detecção de carro.

39