APRENDIZADO DE MÁQUINA NA ERA DO BIG DATAebezerra/wp-content/... · Machine Learning (to play...

67
APRENDIZADO DE MÁQUINA NA ERA DO BIG DATA Eduardo Bezerra (CEFET/RJ) [email protected]

Transcript of APRENDIZADO DE MÁQUINA NA ERA DO BIG DATAebezerra/wp-content/... · Machine Learning (to play...

Page 1: APRENDIZADO DE MÁQUINA NA ERA DO BIG DATAebezerra/wp-content/... · Machine Learning (to play Checkers) 5 Coined the term Machine Learning (“Field of study that gives computers

APRENDIZADO DE MÁQUINA

NA ERA DO BIG DATA

Eduardo Bezerra (CEFET/RJ)

[email protected]

Page 2: APRENDIZADO DE MÁQUINA NA ERA DO BIG DATAebezerra/wp-content/... · Machine Learning (to play Checkers) 5 Coined the term Machine Learning (“Field of study that gives computers

Visão Geral 2

Introdução ao AM

Deep Learning

Considerações Finais

Page 3: APRENDIZADO DE MÁQUINA NA ERA DO BIG DATAebezerra/wp-content/... · Machine Learning (to play Checkers) 5 Coined the term Machine Learning (“Field of study that gives computers

Aprendizado de Máquina 3

Page 4: APRENDIZADO DE MÁQUINA NA ERA DO BIG DATAebezerra/wp-content/... · Machine Learning (to play Checkers) 5 Coined the term Machine Learning (“Field of study that gives computers

O que é Aprendizado de Máquina 4

Page 5: APRENDIZADO DE MÁQUINA NA ERA DO BIG DATAebezerra/wp-content/... · Machine Learning (to play Checkers) 5 Coined the term Machine Learning (“Field of study that gives computers

Machine Learning (to play Checkers) 5

Coined the term Machine Learning (“Field of study that gives

computers the ability to learn without being explicitly programmed.”)

1959 Arthur Samuel

search tree

alpha-beta pruning

scoring functions

Minimax search

TD-learning

“it will learn to play a better game of checkers than can

be played by the person who wrote the program.”

Page 6: APRENDIZADO DE MÁQUINA NA ERA DO BIG DATAebezerra/wp-content/... · Machine Learning (to play Checkers) 5 Coined the term Machine Learning (“Field of study that gives computers

O que é Aprendizado de Máquina 6

Tom Mitchell (1998): “A computer program is said

to learn from experience E with respect to some

task T and some performance measure P, if its

performance on T, as measured by P, improves with

experience E.”

Page 7: APRENDIZADO DE MÁQUINA NA ERA DO BIG DATAebezerra/wp-content/... · Machine Learning (to play Checkers) 5 Coined the term Machine Learning (“Field of study that gives computers

O que é Aprendizado de Máquina 7

Suppose your email program watches which

emails you do or do not mark as spam, and

based on that learns how to better filter spam.

Components:

T

E

P

Page 8: APRENDIZADO DE MÁQUINA NA ERA DO BIG DATAebezerra/wp-content/... · Machine Learning (to play Checkers) 5 Coined the term Machine Learning (“Field of study that gives computers

Machine Learning Systems 8

http://www.nosimpler.me/machine-learning/

Page 9: APRENDIZADO DE MÁQUINA NA ERA DO BIG DATAebezerra/wp-content/... · Machine Learning (to play Checkers) 5 Coined the term Machine Learning (“Field of study that gives computers

Características (features) 9

Page 10: APRENDIZADO DE MÁQUINA NA ERA DO BIG DATAebezerra/wp-content/... · Machine Learning (to play Checkers) 5 Coined the term Machine Learning (“Field of study that gives computers

Terminologia 10

Training sample (training instance or training

example).

Training set

Model

Learning algorithm

Page 11: APRENDIZADO DE MÁQUINA NA ERA DO BIG DATAebezerra/wp-content/... · Machine Learning (to play Checkers) 5 Coined the term Machine Learning (“Field of study that gives computers

Tipos de Aprendizado de Máquina 11

Aprendizado supervisionado (supervised learning)

Aprendizado não supervisionado (unsupervised

learning)

Page 12: APRENDIZADO DE MÁQUINA NA ERA DO BIG DATAebezerra/wp-content/... · Machine Learning (to play Checkers) 5 Coined the term Machine Learning (“Field of study that gives computers

Aprendizado Supervisionado 12

Page 13: APRENDIZADO DE MÁQUINA NA ERA DO BIG DATAebezerra/wp-content/... · Machine Learning (to play Checkers) 5 Coined the term Machine Learning (“Field of study that gives computers

Aprendizado Supervisionado 13

O aprendiz (máquina) recebe as respostas corretas.

Dois subtipos (tarefas):

Classificação: predizer valor discreto.

Regressão: predizer valor contínuo.

Page 14: APRENDIZADO DE MÁQUINA NA ERA DO BIG DATAebezerra/wp-content/... · Machine Learning (to play Checkers) 5 Coined the term Machine Learning (“Field of study that gives computers

Regressão 14

https://www.quora.com/What-is-the-main-difference-between-classification-problems-and-regression-problems-in-machine-learning

Page 15: APRENDIZADO DE MÁQUINA NA ERA DO BIG DATAebezerra/wp-content/... · Machine Learning (to play Checkers) 5 Coined the term Machine Learning (“Field of study that gives computers

Classificação 15

https://www.quora.com/What-is-the-main-difference-between-classification-problems-and-regression-problems-in-machine-learning

Page 16: APRENDIZADO DE MÁQUINA NA ERA DO BIG DATAebezerra/wp-content/... · Machine Learning (to play Checkers) 5 Coined the term Machine Learning (“Field of study that gives computers

Entrada: uma mensagem de e-mail

Saída: spam/ham

Construção: Obter um coleção grande de mensagens como

exemplos, cada uma rotulada como “spam” ou “ham”

Nota: alguém tem que rotular esses dados!

Objetivo: predizer o rótulo adequado para mensagens novas

Características (features): os atributos usados para tomar a decisão (ham / spam) As próprias palavras: FREE!

Padrões textuais: $dd, CAPS

Padrões não-textuais: SenderInContacts

Dear Sir. First, I must solicit your confidence in this transaction, this is by virture of its nature as being utterly confidencial and top secret. …

TO BE REMOVED FROM FUTURE MAILINGS, SIMPLY REPLY TO THIS MESSAGE AND PUT "REMOVE" IN THE SUBJECT. 99 MILLION EMAIL ADDRESSES FOR ONLY $99

Ok, Iknow this is blatantly OT but I'm beginning to go insane. Had an old Dell Dimension XPS sitting in the corner and decided to put it to use, I know it was working pre being stuck in the corner, but when I plugged it in, hit the power nothing happened.

Detecção de spam Aplicações - Classificação

16

Page 17: APRENDIZADO DE MÁQUINA NA ERA DO BIG DATAebezerra/wp-content/... · Machine Learning (to play Checkers) 5 Coined the term Machine Learning (“Field of study that gives computers

Entrada: imagens / matrizes de pixels

Saída: um dígito 0-9

Construção:

Obter um coleção grande de mensagens (exemplos), cada uma rotulada com um dígito

Nota: alguém tem que rotular esses dados!

Objetivo: predizer o rótulo adequado para imagens novas

Características: os atributos usados para tomar a decisão Pixels: (6,8)=ON Padrões de forma: NumComponents, AspectRatio, NumLoops …

0

1

2

1

??

Reconhecimento de Dígitos Aplicações - Classificação

17

Page 18: APRENDIZADO DE MÁQUINA NA ERA DO BIG DATAebezerra/wp-content/... · Machine Learning (to play Checkers) 5 Coined the term Machine Learning (“Field of study that gives computers

18

Reconhecimento de Caracteres Aplicações - Classificação

Page 19: APRENDIZADO DE MÁQUINA NA ERA DO BIG DATAebezerra/wp-content/... · Machine Learning (to play Checkers) 5 Coined the term Machine Learning (“Field of study that gives computers

19

Tradução Automática Aplicações - Classificação

https://intlcontact.com/category/services/translation-service/

Page 20: APRENDIZADO DE MÁQUINA NA ERA DO BIG DATAebezerra/wp-content/... · Machine Learning (to play Checkers) 5 Coined the term Machine Learning (“Field of study that gives computers

20

Nome

Endereço

Cidade

UF

CEP

Modelo

BMX 330

BMX 550

RBX 12

Sirax 220

Street E3

RBX 12

Descrição do problema

Processamento de Linguagem Natural

Análise de Sentimentos Aplicações – Classificação

Page 21: APRENDIZADO DE MÁQUINA NA ERA DO BIG DATAebezerra/wp-content/... · Machine Learning (to play Checkers) 5 Coined the term Machine Learning (“Field of study that gives computers

Classificação: Outras Aplicações

Classificação: dados objetos de entrada, predizer rótulos.

Exemplos: OCR (entrada: imagens, classes: caracteres)

Diagnose médica (entrada : sintomas, classes: doenças)

Correção automática de redações (entrada : documentos, classes: notas)

Detecção de fraude (entrada : atividades na conta, classes: fraude / legítimo)

Roteamento de notícias

… muitos mais

Classificação é uma tecnologia importante comercialmente!

Page 22: APRENDIZADO DE MÁQUINA NA ERA DO BIG DATAebezerra/wp-content/... · Machine Learning (to play Checkers) 5 Coined the term Machine Learning (“Field of study that gives computers

Aprendizado Não Supervisionado 22

Page 23: APRENDIZADO DE MÁQUINA NA ERA DO BIG DATAebezerra/wp-content/... · Machine Learning (to play Checkers) 5 Coined the term Machine Learning (“Field of study that gives computers

Aprendizado Não Supervisionado 23

Page 24: APRENDIZADO DE MÁQUINA NA ERA DO BIG DATAebezerra/wp-content/... · Machine Learning (to play Checkers) 5 Coined the term Machine Learning (“Field of study that gives computers

Segmentação de mercado (CRM) 24

Aplicações

Page 25: APRENDIZADO DE MÁQUINA NA ERA DO BIG DATAebezerra/wp-content/... · Machine Learning (to play Checkers) 5 Coined the term Machine Learning (“Field of study that gives computers

Análise de Redes Sociais 25

Aplicações

Page 26: APRENDIZADO DE MÁQUINA NA ERA DO BIG DATAebezerra/wp-content/... · Machine Learning (to play Checkers) 5 Coined the term Machine Learning (“Field of study that gives computers

Detecção de Valores Extremos (outlier detection)

26

Aplicações

Page 27: APRENDIZADO DE MÁQUINA NA ERA DO BIG DATAebezerra/wp-content/... · Machine Learning (to play Checkers) 5 Coined the term Machine Learning (“Field of study that gives computers

Deep Learning 27

Page 28: APRENDIZADO DE MÁQUINA NA ERA DO BIG DATAebezerra/wp-content/... · Machine Learning (to play Checkers) 5 Coined the term Machine Learning (“Field of study that gives computers

Neural Nets Renaissance

2000s

2009

2006

2007

2009

2009

28

Page 29: APRENDIZADO DE MÁQUINA NA ERA DO BIG DATAebezerra/wp-content/... · Machine Learning (to play Checkers) 5 Coined the term Machine Learning (“Field of study that gives computers

Deep Learning Explosion 29

2010s

Page 30: APRENDIZADO DE MÁQUINA NA ERA DO BIG DATAebezerra/wp-content/... · Machine Learning (to play Checkers) 5 Coined the term Machine Learning (“Field of study that gives computers

Object Detection (in Images) 30

Credits: Mathew Zeiler (Clarifai)

Aplicações

2012

Page 31: APRENDIZADO DE MÁQUINA NA ERA DO BIG DATAebezerra/wp-content/... · Machine Learning (to play Checkers) 5 Coined the term Machine Learning (“Field of study that gives computers

31

Speech Recognition Aplicações

2012

Page 32: APRENDIZADO DE MÁQUINA NA ERA DO BIG DATAebezerra/wp-content/... · Machine Learning (to play Checkers) 5 Coined the term Machine Learning (“Field of study that gives computers

Language Translation 32

Aplicações

2014

Page 33: APRENDIZADO DE MÁQUINA NA ERA DO BIG DATAebezerra/wp-content/... · Machine Learning (to play Checkers) 5 Coined the term Machine Learning (“Field of study that gives computers

33

Face Recognition Aplicações

https://research.fb.com/publications/deepface-closing-the-gap-to-human-level-performance-in-face-verification/

DeepFace: Our method reaches an accuracy of 97.35% […],

reducing the error of the current state of the art by more than

27%, closely approaching human-level performance.

2014

Page 34: APRENDIZADO DE MÁQUINA NA ERA DO BIG DATAebezerra/wp-content/... · Machine Learning (to play Checkers) 5 Coined the term Machine Learning (“Field of study that gives computers

Semantic Segmentation

34 2014

34

Aplicações

Page 35: APRENDIZADO DE MÁQUINA NA ERA DO BIG DATAebezerra/wp-content/... · Machine Learning (to play Checkers) 5 Coined the term Machine Learning (“Field of study that gives computers

Image Generation/Superresolution

2014

35

Aplicações

Generative Adversarial Nets

Page 36: APRENDIZADO DE MÁQUINA NA ERA DO BIG DATAebezerra/wp-content/... · Machine Learning (to play Checkers) 5 Coined the term Machine Learning (“Field of study that gives computers

36

Deep Q-Learning

Deep Reinforcement Learning Aplicações

2015

Page 37: APRENDIZADO DE MÁQUINA NA ERA DO BIG DATAebezerra/wp-content/... · Machine Learning (to play Checkers) 5 Coined the term Machine Learning (“Field of study that gives computers

Games Aplicações

37

2015

Deep Reinforcement Learning

Page 38: APRENDIZADO DE MÁQUINA NA ERA DO BIG DATAebezerra/wp-content/... · Machine Learning (to play Checkers) 5 Coined the term Machine Learning (“Field of study that gives computers

Aplicações

38

2016

Games Deep Reinforcement Learning

Page 39: APRENDIZADO DE MÁQUINA NA ERA DO BIG DATAebezerra/wp-content/... · Machine Learning (to play Checkers) 5 Coined the term Machine Learning (“Field of study that gives computers

Automatic Labelling 39

Long-term Recurrent Convolutional Networks for Visual Recognition and Description, 2016.

Aplicações - Computer Vision + Text Processing

2016

Page 40: APRENDIZADO DE MÁQUINA NA ERA DO BIG DATAebezerra/wp-content/... · Machine Learning (to play Checkers) 5 Coined the term Machine Learning (“Field of study that gives computers

40

Page 41: APRENDIZADO DE MÁQUINA NA ERA DO BIG DATAebezerra/wp-content/... · Machine Learning (to play Checkers) 5 Coined the term Machine Learning (“Field of study that gives computers

Considerações Finais 41

Page 42: APRENDIZADO DE MÁQUINA NA ERA DO BIG DATAebezerra/wp-content/... · Machine Learning (to play Checkers) 5 Coined the term Machine Learning (“Field of study that gives computers

Machine Learning: success factors 42

Big Data

1980s: MNIST ~ 70k

2010s: ImageNet ~ 106

Hadware improvements

Crowdsourcing

Geoffrey Hinton

“What was wrong in the 80’s

is that we didn’t have enough

data and we didn’t have

enough computer power”

Page 43: APRENDIZADO DE MÁQUINA NA ERA DO BIG DATAebezerra/wp-content/... · Machine Learning (to play Checkers) 5 Coined the term Machine Learning (“Field of study that gives computers

ML & Big Data 43

Atualmente, é relativamente fácil obter conjuntos de

dados de treinamento da ordem de 106 exemplos:

e-commerce portals

Kaggle (https://www.kaggle.com)

IoT

...

Page 44: APRENDIZADO DE MÁQUINA NA ERA DO BIG DATAebezerra/wp-content/... · Machine Learning (to play Checkers) 5 Coined the term Machine Learning (“Field of study that gives computers

ML & Big Data 44

Quanto mais dados, melhor?

44

“[…] what we're seeing consistently is that the bigger you can

run these models, the better they perform. If you train one of

these algorithms on one computer, you know, it will do pretty

well. If you train them on 10, it will do even better. If you train

on 100, even better. And we found that when we trained it on

16,000 CPU cores, […], that was the best model we were able

to train.”

http://www.npr.org/2012/06/26/155792609/a-massive-google-network-learns-to-identify

--Andrew Ng

Page 45: APRENDIZADO DE MÁQUINA NA ERA DO BIG DATAebezerra/wp-content/... · Machine Learning (to play Checkers) 5 Coined the term Machine Learning (“Field of study that gives computers

45

Page 46: APRENDIZADO DE MÁQUINA NA ERA DO BIG DATAebezerra/wp-content/... · Machine Learning (to play Checkers) 5 Coined the term Machine Learning (“Field of study that gives computers

Is Winter Coming Again?!

Credits: Monty Barlow

46

Page 47: APRENDIZADO DE MÁQUINA NA ERA DO BIG DATAebezerra/wp-content/... · Machine Learning (to play Checkers) 5 Coined the term Machine Learning (“Field of study that gives computers

Unsupervised Learning

Current models are hungry for labeled data.

Today’s DL is supervised learning.

“The Revolution Will Not be Supervised.” –Yann Lecun

47

Page 48: APRENDIZADO DE MÁQUINA NA ERA DO BIG DATAebezerra/wp-content/... · Machine Learning (to play Checkers) 5 Coined the term Machine Learning (“Field of study that gives computers

Deep RL Takes Too Long to Train

RL systems require a gazillion trials!

48

Page 49: APRENDIZADO DE MÁQUINA NA ERA DO BIG DATAebezerra/wp-content/... · Machine Learning (to play Checkers) 5 Coined the term Machine Learning (“Field of study that gives computers

Natural Language Understanding

◻ Headlines: ⬜ Enraged Cow Injures Farmer With Ax

⬜ Hospitals Are Sued by 7 Foot Doctors

⬜ Ban on Nude Dancing on Governor’s Desk

⬜ Iraqi Head Seeks Arms

⬜ Local HS Dropouts Cut in Half

⬜ Juvenile Court to Try Shooting Defendant

⬜ Stolen Painting Found by Tree

⬜ Kids Make Nutritious Snacks

◻ Why are these funny?

Source: CS188

49

Page 50: APRENDIZADO DE MÁQUINA NA ERA DO BIG DATAebezerra/wp-content/... · Machine Learning (to play Checkers) 5 Coined the term Machine Learning (“Field of study that gives computers

Common Sense Knowledge

"If a mother has a son, then the son is younger than the

mother and remains younger for his entire life."

"If President Trump is in Washington, then his left foot

is also in Washington,"

50

Page 51: APRENDIZADO DE MÁQUINA NA ERA DO BIG DATAebezerra/wp-content/... · Machine Learning (to play Checkers) 5 Coined the term Machine Learning (“Field of study that gives computers

Para Saber Mais – Curso Online 51

Machine Learning

https://www.coursera.org/learn/machine-learning

Deep Learning Specialization

https://www.coursera.org/specializations/deep-learning

Convolutional Neural Networks for Visual Recognition

http://cs231n.stanford.edu/

Page 52: APRENDIZADO DE MÁQUINA NA ERA DO BIG DATAebezerra/wp-content/... · Machine Learning (to play Checkers) 5 Coined the term Machine Learning (“Field of study that gives computers

Para Saber Mais – Livros 52

Page 53: APRENDIZADO DE MÁQUINA NA ERA DO BIG DATAebezerra/wp-content/... · Machine Learning (to play Checkers) 5 Coined the term Machine Learning (“Field of study that gives computers

PPCIC – CEFET/RJ

Programa de Pós-Graduação em

Ciência da Computação

http://eic.cefet-rj.br/ppcic

Page 54: APRENDIZADO DE MÁQUINA NA ERA DO BIG DATAebezerra/wp-content/... · Machine Learning (to play Checkers) 5 Coined the term Machine Learning (“Field of study that gives computers

OBRIGADO!

Eduardo Bezerra ([email protected])

APRENDIZADO DE MÁQUINA

NA ERA DO BIG DATA

Page 55: APRENDIZADO DE MÁQUINA NA ERA DO BIG DATAebezerra/wp-content/... · Machine Learning (to play Checkers) 5 Coined the term Machine Learning (“Field of study that gives computers

BACKUP SLIDES

55

Page 56: APRENDIZADO DE MÁQUINA NA ERA DO BIG DATAebezerra/wp-content/... · Machine Learning (to play Checkers) 5 Coined the term Machine Learning (“Field of study that gives computers

MapReduce 56

Page 57: APRENDIZADO DE MÁQUINA NA ERA DO BIG DATAebezerra/wp-content/... · Machine Learning (to play Checkers) 5 Coined the term Machine Learning (“Field of study that gives computers

MapReduce 57

Alguns problemas de AM podem ser muito grandes

para resolver em apenas uma máquina.

Uma abordagem geral para distribuir

processamento é denominada MapReduce.

Aplicável a alguns algoritmos de AM...

Sanjay Ghemawat Jeff Dean

Page 58: APRENDIZADO DE MÁQUINA NA ERA DO BIG DATAebezerra/wp-content/... · Machine Learning (to play Checkers) 5 Coined the term Machine Learning (“Field of study that gives computers

Muitos algoritmos de AM podem ser expressados como

a computação de somatórios de funções sobre o conjunto

de treinamento.

e.g., para a regressão logística, precisamos de:

MapReduce

Page 59: APRENDIZADO DE MÁQUINA NA ERA DO BIG DATAebezerra/wp-content/... · Machine Learning (to play Checkers) 5 Coined the term Machine Learning (“Field of study that gives computers

Máquina 4

Máquina 3

Máquina 2

Máquina 1

Combinar

resultados

Training set

[http://openclipart.org/detail/17924/computer-by-aj]

MapReduce

Page 60: APRENDIZADO DE MÁQUINA NA ERA DO BIG DATAebezerra/wp-content/... · Machine Learning (to play Checkers) 5 Coined the term Machine Learning (“Field of study that gives computers

Máquinas multi-core & GPUs

Core 4

Core 3

Core 2

Core 1

Combinar

resultados

Training set

[http://openclipart.org/detail/100267/cpu-

(central-processing-unit)-by-ivak-100267]

MapReduce

Page 61: APRENDIZADO DE MÁQUINA NA ERA DO BIG DATAebezerra/wp-content/... · Machine Learning (to play Checkers) 5 Coined the term Machine Learning (“Field of study that gives computers

Artificial Neural Nets (ANNs) 61

Page 62: APRENDIZADO DE MÁQUINA NA ERA DO BIG DATAebezerra/wp-content/... · Machine Learning (to play Checkers) 5 Coined the term Machine Learning (“Field of study that gives computers

Artificial Neuron

62

A model inspired in the real one

(biological neuron).

62

Page 63: APRENDIZADO DE MÁQUINA NA ERA DO BIG DATAebezerra/wp-content/... · Machine Learning (to play Checkers) 5 Coined the term Machine Learning (“Field of study that gives computers

Artificial Neuron - input

63

63

Page 64: APRENDIZADO DE MÁQUINA NA ERA DO BIG DATAebezerra/wp-content/... · Machine Learning (to play Checkers) 5 Coined the term Machine Learning (“Field of study that gives computers

Artificial Neuron – parameters

64

64

Page 65: APRENDIZADO DE MÁQUINA NA ERA DO BIG DATAebezerra/wp-content/... · Machine Learning (to play Checkers) 5 Coined the term Machine Learning (“Field of study that gives computers

Artificial Neuron – pre-activation

65

65

Page 66: APRENDIZADO DE MÁQUINA NA ERA DO BIG DATAebezerra/wp-content/... · Machine Learning (to play Checkers) 5 Coined the term Machine Learning (“Field of study that gives computers

Artificial Neuron – activation function

66 66

"The composition of linear

transformations is also a

linear transformation"

Nonlinearities are necessary

so that the network can

learn complex

representations of the data.

66

Page 67: APRENDIZADO DE MÁQUINA NA ERA DO BIG DATAebezerra/wp-content/... · Machine Learning (to play Checkers) 5 Coined the term Machine Learning (“Field of study that gives computers

Artificial Neural Net

67 Feedforward Neural Network

67