Sistemas de Recomendação: Como funciona e Onde Se aplica?

Sistemas de Recomendação

Marcel Pinheiro [email protected]

@marcelcaraciolo

http://www.orygens.com

Thursday, January 26, 2012

mailto:[email protected]






Quem é Marcel ?

Marcel Pinheiro Caraciolo - @marcelcaraciolo

Mestre em Ciência da Computação no CIN/UFPE na área de mineração de dados

Diretor de Pesquisa e Desenvolvimento na Orygens

Membro e Moderador da Celúla de Usuários Python de Pernambuco (PUG-PE)

Minhas áreas de interesse: Computação móvel e Computação inteligente

Meus blogs: http://www.mobideia.com (sobre Mobilidade desde 2006) http://aimotion.blogspot.com (sobre I.A. desde 2009)

Jovem Aprendiz ainda nas artes pythonicas.... (desde 2007)

Sergipano, porém Recifense.


http://www.mobideia.com

http://www.mobideia.com

http://aimotion.blogspot.com

http://aimotion.blogspot.com

WEB


WEBWEB


VI Encontro do PUG-PEVI Encontro do PUG-PE

1.0 2.0

Fonte de Informação Fluxo Contínuo de Informação



3.0

USERS

WEB SITESWEB APPLICATIONS

WEB SERVICESSEMANTIC WEB


Usar informação coletiva de forma efetiva afim de

aprimorar uma aplicação


Intelligence from Mining Data

UserUserUserUserUser

Um usuário influencia outrospor resenhas, notas, recomendações e blogs

Um usuário é influenciado por outrospor resenhas, notas, recomendações e blogs


Collective IntelligenceYour application

Search

aggregation information: lists

Clustering and predictive models

recommendationsreviews

voting

blogs

Natural Language Processing

ratings

saving

bookmarking

wikis

user-generated content

taggingtag cloud

Harness external content



3.0

USERS

WEB SITESWEB APPLICATIONS

WEB SERVICESSEMANTIC WEB

antes...Friday, October 1, 2010Thursday, January 26, 2012

!"#$%&'()$*+$,-$&.#'/0'&%)#)$1(,0#

Atualmente


estamos sobrecarregados de informações

Friday, October 1, 2010


muitas vezes inúteis

Friday, October 1, 2010Thursday, January 26, 2012

às vezes procuramos

isso...


e encontramos isso!


google?


google?

midias sociais?


google?

midias sociais?

eeeeuuuu...


Sistemas de Recomendação Friday, October 1, 2010

Sistemas de RecomendaçãoThursday, January 26, 2012

“A lot of times, people don’t know what they want until you show it to them.”

Steve Jobs

“We are leaving the Information age, and entering into the Recommendation age.”

Chris Anderson, from book Long Tail


Recomendações Sociais

“Eu acho que

você deveria ler

estes livros.

Amigos/ Família O Que eu

deveria ler ?

Ref: Flickr photostream: jefield

Ref: Flickr-BlueAlgae

Família/Amigos


Recomendações por Interação

“Livros que você

pode gostar

são …”

Saída:

Entrada:

O Que eu

deveria ler ?

Avalie alguns livros


Sistemas desenhados para sugerir algo para mim do meu interesse!


Por que Recomendação ?


Netflix- 2/3 dos filmes alugados vêm de recomendação

Google News- 38% das notícias mais clicadas vêm de recomendação

Amazon- 38% das vendas vêm de recomendação

Fonte: Celma & Lamere, ISMIR 2007


Nós estamos sobrecarregados de informação

!"#$%"#&'"%(&$)")

* +,&-.$/).#&0#/"1.#$%234(".#

$/)#5(&6 7&.2.#"$4,#)$8

* 93((3&/.#&0#:&'3".;#5&&<.#

$/)#:-.34#2%$4<.#&/(3/"

* =/#>$/&3;#?#@A#+B#4,$//"(.;#

2,&-.$/).#&0#7%&6%$:.#

"$4,#)$8

* =/#C"1#D&%<;#."'"%$(#

2,&-.$/).#&0#$)#:"..$6".#

."/2#2&#-.#7"%#)$8

Milhares de artigos e posts novos todos os dias

Milhões de Músicas, Filmes e Livros

Milhares de Ofertas e Promoções


O que pode ser recomendado ?

Messagens de Propaganda

Tags

Opções de Investimento

Restaurantes

Músicas

Filmes

Livros

Programas de Tv

Roupas

ProdutosArtigos

Futuras namoradas

Contatos em Redes Sociais

Cursos e-learning

VídeosProfissionais

Papers

Módulos de código


E como funciona a recomendação ?


O que os sistemas de recomendação realmente fazem ?

1. Prediz o quanto você pode gostar de um certo produto ou serviço

2. Sugere um lista de N items ordenada de acordo com seu interese

3. Sugere uma lista de N usuários ordernada para um produto/serviço

4. Explica a você o porque esses items foram recomendados

5. Ajusta a predição e a recomendação baseado em seu feedback e de outros.


Filtragem baseada por Conteúdo

O Vento Levou

Duro de Matar

Similar

Armagedon ToyStore

Marcel

gostarecomenda

Items

Usuários


Problemas com filtragem por conteúdo

1. Análise dos dados Restrita

3. Efeito Portfólio

- Items e usuários pouco detalhados. Pior em áudio ou imagens

- Uma pessoa que não tem experiência com Sushi não recebe o melhor restaurante de Sushi da cidade

- Só porque eu vi 1 filme da Xuxa quando criança, tem que me recomendar todos dela

2. Dados Especializados


Filtragem Colaborativa

O Vento Levou

Thor

Similar

Armagedon ToyStore

Marcel

gosta recomenda

Items

Rafael Amanda Usuários


Problemas com filtragem colaborativa

1. Escabilidade

2. Dados esparsos

3. Partida Fria

4. Popularidade

- Amazon com 5M usuários, 50K items, 1.4B avaliações

- Novos usuários e items que não tem histórico

- Só avaliei apenas um único livro no Amazon!

- A pessoa que lê ‘Harry Potter’ lê Kama Sutra5. Hacking

- Todo mundo lê ‘Harry Potter’


Filtragem Híbrida

Marcel Rafael Luciana

O Vento Levou

Duro de Matar Armagedon Toy

StoreItems

Usuários

OntologiasDados

Símbolicos

Combinação de múltiplos métodos


Como eles são apresentados ?

Destaques Mais sobre este artista...

Escute músicas de artistas similares...

Alguem similar a você também gostou disso

Já que você escutou esta, você pode querer esta...

Estes dois item vêm juntos..

O mais popular em seu grupo...

Lançamentos


Como eles são avaliados ?

Como sabemos se a recomendação é boa ?

Geralmente se divide-se em treinamento/teste (80/20)

Críterios utilizados:

- Erro de Predição: RMSE

- Curva ROC*, rank-utility, F-Measure*http://code.google.com/p/pyplotmining/


http://code.google.com/p/pyplotmining/

http://code.google.com/p/pyplotmining/

Mobile Recommenders


Por que mobile ?

Mais de 5 bilhões de apps baixadas

http://vimeo.com/29323612

Mais de 1 bilhão de Aparelhos

Destaque no segmento mobilehttp://foursquare.com




http://foursquare.com

http://foursquare.com

Sistemas de Recomendação Móvel

Deve-se levar em conta informações temporais e espaciais

Como definir que contexto ele está inserido ?

E as avaliações como ser capturadas em uma tela limitada?


Arquitetura

Recomendações processadas via Mobile (Inviável Hoje)

- Tudo é processado em Back-End (Servidor) e enviado ao celular via Web

repackage the heterogeneous data and service, and republic them as web service. The successful design of this module is the key problem for realization of cross-platform service and data sharing.

The functional layer has three components as Multi-Mode Location Information Index, Context-based Collaborative Filtering Algorithm, and Location-based Personalized Recommendation and Navigation. We will discuss every function component in details as follows.

Fig 1. Architecture of the Mobile Information Pushing System !!

3 Location-based Data and Service Middleware based on SOA

Service-Oriented Architecture! "SOA is considered as the next generation of Web services infrastructure. Its central idea is to design software applications from the perspective of integrated services, and to consider how to reuse existing services#! SOA encourages the use of alternative technologies and methods (such as message mechanism). It prefers

service combination rather than the preparation of new code to the framework of the application.

After an appropriate design and development, the new application based on this kind of message mechanism can be simply by adjusting the original service model rather than be forced to carry out large-scale code development of new applications. Thus it can response quickly in according to the changing market conditions.

So in this system, we implement a special Data and Service Combination service similar with Middleware based on Service-Oriented Architecture. This method can solve the following two technical issues: multiple formats of data integration and conversion, as well as a combination of a wide range of services.

!!Despite the existing network information service platforms have already accumulated a lot of useful information, as the Public-Rating “Da Zhong Dian Ping” website (the famous and successful public facilities rating, comments and recommendation website, which has already millions of users) [12]. However, its text-based geographical information or static guiding map can not be used directly in the mobile location-based navigation. This is also very inconvenient for users, especially who is not familiar with the visiting area. In order to solve this problem, we analyze a scenario as restaurant query based current location, and propose the possible query process.

Let us place typical query information as an example. Users want to know the restaurants’ location and introduction data within 500 meters from its current location. For the query, users first through the mobile terminal to obtain a coordinate information, According to the coordinates information and then calculate the distance of their current location within 500 meters of the regional information (for example, all the street names in the target area). From the existing network information service platform, it will search all matching restaurant with the same street information.

According to personal preferences, the user continues to review the feedback restaurant list, and select the places he wishes to go. Based on this new query, the system should obtain the coordinate information of the selected restaurant and visualize the corresponding navigation information through the mobile navigation software.

In Figure 2, we abstract the above scenario for the query process of a portfolio of services. The whole scene is composed by a number of Service Components. According to every Service

Location-based DB Traffic-info E-Map

Value-added DB Comments Tags Ratings …..….

Location-based Services GPS Navigation Location-based info Booking

Entity-query

Value-added Services in Web 2.0 User Tagging Information Publish Recommendation ……...

Mobile Information Pushing Platform

Location-based Data and Service Middleware

Context-based Collaborative Filtering

Multi-Mode Location Information Index

Location-based personalized recommendation and Navigation

WSEAS TRANSACTIONS on COMPUTERS Fan Yang, Zhi-Mei Wang

ISSN: 1109-2750 727 Issue 4, Volume 8, April 2009

involved in more platforms and components, including the Internet, GIS, positioning equipment and telecommunications technology and so on.

From the data perspective, LBS needs to obtain data from different sources, such as remote sensors, positioning systems, electronic maps, traffic and transportation databases and so on.

Therefore, from the system architecture perspective, LBS has a strong heterogeneity. At the same time, the user's location is constantly changing. Thus, the data-processing capability in the server side LBS services on the system server-side has brought new challenges [4-6].

For this new type of location-based information retrieval approach, users want to be able to obtain more real-time and targeted content services, not just the indexed information based simply on a static database[7-8]. Recently, the rise of a large number of Web2.0 applications (blog, community forums, Web Albums, Blog and Taggings, etc.) indicates that users have the very pressing requirements of direct, rapid, useful and personalized information recommendation and sharing services [9-13].

If the information can be user-friendly visualized in the client mobile terminals, It should doubtless be a very important research topic, and will have a very wide market prospect.

This paper designs and realizes a location-based mobile restaurant recommendation and navigation system. In order to improve server-side response speed for real-time query, we propose a memory pool model, the expansion Accept command, no-data client polling and interrupt mechanism, which aims to greatly optimize the server-side control procedures. On the client side, we combine the latest Web2.0 application data with the location-based data, and propose a collaborative assessment and recommend mechanisms, which can provide users with real-time location-based restaurant and recommend personalized navigation.

Users can also manually provide personalized tagging and recommendation to build their own social networks, which can help them to consider other similar community users!collaborative

comemnts and obtain more presice content pushing service.

Section 2 presents a simple description of the system's overall architecture and component. The server-side operating mechanism, working threads, listening thread mechanism and optimize the statement is discussed in Section 3. And in section4 you can find the introduction of the functional in the client side considering the users commend and recommend mechanisms . A case study is carried out in Section 5. Finaly, the conclusion of this paper and future work overview are discussed in Section 6. 2 System Workflow and Architecture Figure 1 gives the workflow of our system. Users can send their inquiries demand by operating in the mobile phone. And the client will get the current location information and sent it together with users’ inqueries demand to the server. Server-side application will analyze the relevant data and provide matched restaurant recommendation and navigation.

Application data information of our system can be divided into two parts: the location-based data (such as traffic and road condition data, GPS map, and entity information, etc.) and the value-added data provided by users (such as Ratings, Comments, Blog and Tags, etc.).

Fig.1. System Workflow

Clien

Client

User

Server

Prescribed Location-based Info.

Matched Entity & Route Info.

Personalized Location-based Restaurant Recommendation & Navigation Services

Location-based DBGPS-info E-MapEntity-info ……...

Restaurant Query

Users‘ Collaborative Recommendation & Entity Feature Info.

Value-added DB Comments Tags Ratings …..….

WSEAS TRANSACTIONS on INFORMATION SCIENCE and APPLICATIONS Zhi-Mei Wang, Fan Yang

ISSN: 1790-0832 810 Issue 5, Volume 6, May 2009


Informações Disponíveis

Localização, Tags, Contexto


Informações Disponíveis

Avaliação Implícita


Um dos mais populares sistemas de localização móvel

Checkins, diga aonde você está!

Recomendações de lugares


Assistente Virtual Móvel Conversacional

Já se utiliza de informações das redes Sociais

Recomendação de Restaurantes


Google HotPot

Repositório de Reviews

Recomendação de Lugares


Minhas contribuições


Offering Products and Services Using ProductReviews from Social Networks in Mobile Decision

Aid SystemsMarcel Caraciolo! and Germano Vasconcelos†

Informatics CenterFederal University Of PernambucoWebSite: http://www.cin.ufpe.br/Email: [email protected]

† [email protected]

Abstract—Recommendation engines provide information fil-tering functions and decision aids that have a great potentialapplication the mobile context. An aspect that hasn’t beenextensively exploited yet in the current recommendations isthe improvement in the explanation of the recommendation.For instance, exploiting the service and product descriptionand the opinion of users about the recommended products,where associated would bring a better explanation for the user.In this paper we will present the foundations for a mobileproduct/service recommender system which incorporate bothstructured (supplier driven) product descriptions and subjectproduct information, extracted from user reviews. We believethat this type of recommendation system could be extremelyuseful in the mobile context, where people must take decisionsin a rather short of time, with a limited availability of productinformation as also with limited device capabilities. Our researchfocus on the exploration of methodologies and data miningtechniques for improving the user acceptance of product/servicerecommendations and how explain these recommendations inthe mobile context. To achieve this task, we have proposed anew approach where both product/service descriptions and userreviews are incorporated in the recommendation process. Wethink this approach, exploiting the hidden knowledge insidethe reviews and descriptions, when associated, will bring tothe user more confidence on the recommendation and a betterunderstanding of the product. The products considered by thismobile recommender system are restaurants and retail stores.

I. INTRODUCTION

Recently, the rise of a large number of Web 2.0 applica-tions (blog, community forums, Web Albums, etc.) points outthat the users have the very pressing requirements of direct,rapid, useful and personalized information recommendationand sharing services [1], [2]. When it comes to choose aproduct to purchase, many consumers look for different waysto obtain more precise information for measuring the qualityof these products and services such as electronics, restaurants,merchants, etc. Current approaches supporting consumers intheir buying decision are, amongst others, provided throughweb-based product recommendation systems [3], [4]. Theyprovide personal reviews submited by another users, supportedby rating and their experience by comments as continuous text.The online reviews are not new in the web scenario and are

extremely used by users to give a more nuanced view abouta product in order to make an informed decision [5].Nonetheless, providing users with relevant recommenda-

tion information it is a difficult task. Besides the technicalcomponents such as the user model representation and infor-mation filtering techniques to generate the recommendations,the information must be user-friendly visualized. This is arequirement specially to support the user in the purchasedecision process, and to convince him about the utility of thegiven recommendation.In the mobile context this can be considered as a challenging

task. The product recommendation systems in the Internet donot meet the needs of customers in physical stores and conse-quently they do not meet the needs of mobile users. Severalreasons point out the limitation in the interaction of those typeof systems: data exchange costs, enviromental disturbances(light, noise, etc.), even parallel activities (driving, travelling,etc.) [6]. There are also the device restrictions, such as , smallcomputation capabilities and limited small displays. Whetherthis information can be user-friendly visualized in the clientmobile terminals, it should doubtlesss be a very importanttopic research.This work presents our research focused on methodologies

and techniques for exploiting the user acceptance of productrecommendations and for explaining these recommendationsin the mobile context. To achieve this task, we shall presenta new approach where the product description and user re-views are incorporated. We believe this approach, reveallingthe hidden knowledge inside the reviews, will bring moreconfidence to the user on the recommendations and a betterproduct understanding. The products considered by this mobilerecommender system are hotels and restaurants.The main contributtion of our work is the incorporation

of reviews from Web 2.0 applications (Taggings, comments,rating etc.) in a structured way into the recommendationprocess. We believe that reviews as great source of recom-mendation information, in which can improve the productinformation, the product experience and influence the userbuying decision [8]. To exploit this kind of information, we

Meu trabalho de Mestrado


How reviews from web services sources can be aggregated in the mobile recommendation process?

source, the recommendation architecture that we propose willaggregate the results of such filtering techniques.We aim at integrating the previously mentioned hybrid prod-

uct recommendation approach in a mobile application so theusers could benefit from useful and logical recommendations.Moreover, we aim at providing a suited explanation for eachrecommendation to the user, since the current approaches justonly deliver product recommendations with a overall scorewithout pointing out the appropriateness of such recommen-dation [13]. Besides the basic information provided by thesuppliers, the system will deliver the explanation, providingrelevant reviews of similar users, we believe that it willincrease the confidence in the buying decision process and theproduct accepptance rate. In the mobile context this approachcould help the users in this process and showing the useropinions could contribute to achieve this task.

!"#$%&'%($)

!"*+#,$+'-)

!".,"/#)

!"*+#,$+'-)

0+($"($)1%#"2)

3,4$"',(5)

!"#$%&"'()*+,#&-,.)

/$%,0"12()*3$4%)3""5.)

0+44%6+'%$,.")1%#"2)

3,4$"',(5)

)))67,8,#%)+,4%$91$'%4)-1":))))

))))1,;&,<4)<1&%%,')=2)4&:&8$1))

)))))))))))%$4%,5)94,14>?)

7"$%)

!"8+99"(2"'))

!"8+99"(2%$,+(#)

Fig. 1. Meta Recommender Architecture

Since one of the goals of this work is to incorporatedifferent data sources of user opinions and descriptions, wehave addopted an meta recommendation architecture. By usinga meta recommender architecture, the system would providea personalized control over the generated recommendation listformed by the combination of rich data [16]. The influenceof the specific data sources could be explicitly controlled byevaluating the past user interaction with the recommender todecide how to balance the different knowledge sources. Forinstance, if the product or service to be recommended has arich structured description (e.g. restaurant) , then the systemtends to use more content-based filtering approach. Otherwise,if the product is poorly described (attributes), then the system

would rely more on collaborative-filtering techniques, that is,the reviews from similar users.Figure 1 shows a overview of our meta recommender

approach. By combining the content-based filtering and thecollaborative-based one into a hybrid recommender system, itwould use the services/products repositories which cataloguesthe services to be recommended, and the review repositorythat contains the user opinions about those services. All thisdata can be extracted from data source containers in the websuch as the location-based social network Foursquare [17] asdisplayed at the Figure 2 and the location recommendationengine from Google: Google HotPot [18].

Fig. 2. User Reviews from Foursquare Social Network

The content-based filtering approach will be used to filterthe product/service repository, while the collaborative basedapproach will derive the product review recommendations. Inaddition we will use text mining techniques to distinct thepolarity of the user review between positive or negative one.This information summarized would contribute in the productscore recommendation computation. The final product recom-mendation score is computed by integrating the result of bothrecommenders. By now, we are considering to use differentoptions regarding this integration approach, one at specialis the symbolic data analysis approach (SDA) [19], whicheach product description and user ratings/reviews are modeledas set of modal symbolic descriptions that summarizes theinformation provided by the corresponding data sources. It isa novel approach in hybrid recommender systems which,i nour domain, can encapsulate in entities the levels of influenceof both user reviews and product descriptions.

B. Symbolic Recommendation ApproachThe Symbolic Data Analysis (SDA) is a research field that

provides suitable tools to manage aggregated data detailedby multi-valued variables, where data table entries are sets

of categories, ordered list of categories, intervals or weighthistograms [19]. It is also provides approaches for informationfiltering algorithms such as Content-Based , Collaborative-Based and Hybrid Base ones. The main idea is to representthe user profile, in our domain the product to be recom-mended, through symbolic data structures and the user anditem correlations are computed through dissimilarity functionsadapted from the symbolic data analysis (SDA) domain. Theadvantage of using SDA based information-filtering methodsin the context of recommendations is that the user descriptionsynthetizes the entire body of information taken from the itemdescriptions belonging to the user profile. Therefore, itensare described by histogram-value symbolic data, so it can becompared through a dissimilarity function. By using the userreviews and the product descriptions modeled by histogram-value symbolic data, it would attend our requirements sinceour recommendations would be balanced by both structures.Bezerra and Carvalho proposed approaches where the resultsachieved showed to be very promising [19].

III. SYSTEM DESIGNApplication data information our mobile recommender sys-

tem can be divided into two parts: the product description(such as location, description and its attributes) and the userreviews or ratings provided by user (such as rating, comments,tags, etc.). The Figure 3 gives the system’s architecture andrelative components.

!"#$"%&'$

!(#$()&'*&%$+,-*.&$

/01&'234&$

5&-$

!6#$6,00&41&7$

8&4,99&0731*,0$:0;*0&$

<',7)41$

8&=,%*1,'>$

8&?*&@$

8&=,%*1,'>$

8&%).1%$

!<#$<'&2&'&04&%A$B,431*,0A$&14C$

!B#$B*%1$,2$D4,'&7$<',7)41%$

!8#$830E&7$<',7)41%$

!(#$()&'*&%$

Fig. 3. Mobile Recommender System Architecture

In our mobile product/service recommender, the user couldfilter some products or services and get a list of recommen-tations. The user also can enter his preferences or give hisfeedback to some offered product recommendation.Other functionalities are the retrieval of the next ve best

recommendations, the search for reviews satisfying somegiven constraints (text-length, date, review attitude) or thosecontaining some keywords. Let us place a typical use scenarioof this recommender by showing a restaurant query example.For instance, a user wants to know good restaurants for eating

a chinese food around his current location (the system alsocould integrate location-based services). Refining the querythe user also searches for places with highly positive reviews.According to the coordinate information and then calculate thedistance of their current location, the system would provide alist ranked by the highly positive reviews of restaurants joinedby summarized reviews written by the most similar users,that reviewed the restaurant services. According to personalpreferences, the user continues to review the recommendationrestaurant list, and select places he wishes to go. After gonethe places he selected, the user coult enter his feedback, inform of wishes or critiques, to the service recommendation.This information would be added to the reviews repository forthose places where it would be processed and summarized byour recommender, extracting useful information such as thepolarity and adding new keywords for the product featuresvector.

IV. METHODOLOGY AND EXPECTED RESULTSA. MethodologyOur research focuses on methodologies and techniques

for improving the user acceptance of product and servicesrecommendations and explaining these suggestions in themobile context. To achieve this, we have used a new approachwhere both product descriptions and user reviews are incor-porated into the mobile recommendation process. We believethis approach will bring to the user more condence on therecommendations and a better understanding of the productsspecially supporting him in the buying decision process. Toaccomplish this task, we will do a overview of the context inwhich mobile recommender systems are included and the mainresearch concerns about the topic. We wil also analyze and in-vestigate approaches for filtering algorithms, where could baseour design and implementation choices on previous failures orsucesses and reuse and adapt successful solutions. We will alsopropose our meta-recommender system by providing a detailedexplanation of our recommender architecture, implementation,main processes, and crucial features. Finally, to validate it,we will use standard measures of recommendation systemscomparing our suggestions to mobile users in a real datasetextracted from Web and discussing the results.

B. Expected ResultsWe believe that our product mobile recommender incorpo-

rating user reviews will increase the user trust in the recom-mended products, since he can read opinions, both positiveand negative from a group of similar users. Moreover, viewingother reviews, the user can feel more estimulated to share hisown experiences as also obtain recognition from other users.Reviews also provide a better product understanding sincethere will be more information for the user to decide if theproduct is (or is not) suitable for him. Finally, presenting alist of recommendations aside with explanations may provide’local Hidden knowledge’. Local hidden knowledge can bedescribed as knowledege that you gain only after purschasinga product or visiting a place. It can not be found using


Sentiment Analysis for Extracting the Polarity

Text Mining A Lot!

Meta-Recommender Engines

Content-Based Filtering

kNN - Nearest Neighbors

Hybrid Meta Recommender

Symbolic Data Analysis (SDA)

Architectural Proposal for Mobile Recommender

Evaluation in Experimental DataSets


CrabA Python Framework for Building

Recommendation Engines

Marcel Caraciolo@marcelcaraciolo

Bruno Melo@brunomelo

Ricardo Caspirro@ricardocaspirro


What is Crab ?

A python framework for building recommendation engines

A Scikit module for collaborative, content and hybrid filtering

Mahout Alternative for Python Developers :D

Open-Source under the BSD license

https://github.com/muricoca/crab




The current Crab


The current Crab>>>#load the dataset



>>> from crab.datasets import load_sample_movies




>>> data = load_sample_movies()





>>> data


The current Crab

{'DESCR': 'sample_movies data set was collected by the book called \nProgramming the Collective Intelligence by Toby Segaran \n\nNotes\n----- \nThis data set consists of\n\t* n ratings with (1-5) from n users to n movies.', 'data': {1: {1: 3.0, 2: 4.0, 3: 3.5, 4: 5.0, 5: 3.0}, 2: {1: 3.0, 2: 4.0, 3: 2.0, 4: 3.0, 5: 3.0, 6: 2.0}, 3: {2: 3.5, 3: 2.5, 4: 4.0, 5: 4.5, 6: 3.0}, 4: {1: 2.5, 2: 3.5, 3: 2.5, 4: 3.5, 5: 3.0, 6: 3.0}, 5: {2: 4.5, 3: 1.0, 4: 4.0}, 6: {1: 3.0, 2: 3.5, 3: 3.5, 4: 5.0, 5: 3.0, 6: 1.5}, 7: {1: 2.5, 2: 3.0, 4: 3.5, 5: 4.0}}, 'item_ids': {1: 'Lady in the Water', 2: 'Snakes on a Planet', 3: 'You, Me and Dupree', 4: 'Superman Returns', 5: 'The Night Listener', 6: 'Just My Luck'}, 'user_ids': {1: 'Jack Matthews', 2: 'Mick LaSalle', 3: 'Claudia Puig', 4: 'Lisa Rose', 5: 'Toby', 6: 'Gene Seymour', 7: 'Michael Phillips'}}

>>>#load the dataset



>>> data


The current Crab


The current Crab

>>> from crab.models import MatrixPreferenceDataModel


The current Crab


>>> m = MatrixPreferenceDataModel(data.data)


The current Crab

>>> print mMatrixPreferenceDataModel (7 by 6) 1 2 3 4 5 ...1 3.000000 4.000000 3.500000 5.000000 3.0000002 3.000000 4.000000 2.000000 3.000000 3.0000003 --- 3.500000 2.500000 4.000000 4.5000004 2.500000 3.500000 2.500000 3.500000 3.0000005 --- 4.500000 1.000000 4.000000 ---6 3.000000 3.500000 3.500000 5.000000 3.0000007 2.500000 3.000000 --- 3.500000 4.000000


>>> m = MatrixPreferenceDataModel(data.data)


The current Crab


The current Crab

>>> #import pairwise distance


The current Crab


>>> from crab.metrics.pairwise import euclidean_distances


The current Crab



>>> #import similarity


The current Crab



>>> #import similarity>>> from crab.similarities import UserSimilarity


The current Crab




>>> similarity = UserSimilarity(m, euclidean_distances)


The current Crab





>>> similarity[1]


The current Crab

[(1, 1.0), (6, 0.66666666666666663), (4, 0.34054242658316669), (3, 0.32037724101704074), (7, 0.32037724101704074), (2, 0.2857142857142857), (5, 0.2674788903885893)]





>>> similarity[1]


The current Crab

[(1, 1.0), (6, 0.66666666666666663), (4, 0.34054242658316669), (3, 0.32037724101704074), (7, 0.32037724101704074), (2, 0.2857142857142857), (5, 0.2674788903885893)]





>>> similarity[1]

MatrixPreferenceDataModel (7 by 6) 1 2 3 4 5 ...1 3.000000 4.000000 3.500000 5.000000 3.0000002 3.000000 4.000000 2.000000 3.000000 3.0000003 --- 3.500000 2.500000 4.000000 4.5000004 2.500000 3.500000 2.500000 3.500000 3.0000005 --- 4.500000 1.000000 4.000000 ---6 3.000000 3.500000 3.500000 5.000000 3.0000007 2.500000 3.000000 --- 3.500000 4.000000


The current Crab


The current Crab

>>> from crab.recommenders.knn import UserBasedRecommender


The current Crab


>>> recsys = UserBasedRecommender(model=m, similarity=similarity, capper=True,with_preference=True)


The current Crab



>>> recsys.recommend(5)array([[ 5. , 3.45712869], [ 1. , 2.78857832], [ 6. , 2.38193068]])


The current Crab

>>> recsys.recommended_because(user_id=5,item_id=1)array([[ 2. , 3. ], [ 1. , 3. ], [ 6. , 3. ], [ 7. , 2.5], [ 4. , 2.5]])





The current Crab

>>> recsys.recommended_because(user_id=5,item_id=1)array([[ 2. , 3. ], [ 1. , 3. ], [ 6. , 3. ], [ 7. , 2.5], [ 4. , 2.5]])




MatrixPreferenceDataModel (7 by 6) 1 2 3 4 5 ...1 3.000000 4.000000 3.500000 5.000000 3.0000002 3.000000 4.000000 2.000000 3.000000 3.0000003 --- 3.500000 2.500000 4.000000 4.5000004 2.500000 3.500000 2.500000 3.500000 3.0000005 --- 4.500000 1.000000 4.000000 ---6 3.000000 3.500000 3.500000 5.000000 3.0000007 2.500000 3.000000 --- 3.500000 4.000000


The current Crab

Collaborative Filtering algorithms

Evaluation of the Recommender Algorithms

User-Based, Item-Based and Slope One

Precision, Recall, F1-Score, RMSE

Precision-Recall Charts


Evaluating your recommender



>>> from crab.metrics.classes import CfEvaluator




>>> evaluator = CfEvaluator()





>>> evaluator.evaluate(recommender=recsys,metric='rmse')





>>> evaluator.evaluate(recommender=recsys,metric='rmse'){'rmse': 0.69467177857026907}






>>> evaluator.evaluate_on_split(recommender=recsys, at =2)






>>> evaluator.evaluate_on_split(recommender=recsys, at =2)

({'error': [{'mae': 0.345, 'nmae': 0.4567, 'rmse': 0.568}, {'mae': 0.456, 'nmae': 0.356778, 'rmse': 0.6788}, {'mae': 0.456, 'nmae': 0.356778, 'rmse': 0.6788}],

'ir': [{'f1score': 0.456, 'precision': 0.78557, 'recall':0.55677}, {'f1score': 0.64567, 'precision': 0.67865, 'recall': 0.785955},

{'f1score': 0.45070, 'precision': 0.74744, 'recall': 0.858585}]}, {'final_score': {'avg': {'f1score': 0.495955,

'mae': 0.429292, 'nmae': 0.373739,

'precision': 0.63932929, 'recall': 0.729939393, 'rmse': 0.3466868},

'stdev': {'f1score': 0.09938383 , 'mae': 0.0593933,

'nmae': 0.03393939, 'precision': 0.0192929, 'recall': 0.031293939, 'rmse': 0.234949494}}})


Distributing the recommendation computations

Use Hadoop and Map-Reduce intensivelyhttps://github.com/pfig/mrjobInvestigating the Yelp mrjob framework

Develop the Netflix and novel standard-of-the-art usedMatrix Factorization, Singular Value Decomposition (SVD), Boltzman machines

The most commonly used is Slope One technique.Simple algebra math with slope one algebra y = a*x+b


https://github.com/pfig/mrjob

https://github.com/pfig/mrjob

Cache/Paralelism with joblib

class UserSimilarity(BaseSimilarity): ...

@memory.cache def get_similarity(self, source_id, target_id): source_preferences = self.model.preferences_from_user(source_id) target_preferences = self.model.preferences_from_user(target_id)

return self.distance(source_preferences, target_preferences) \ if not source_preferences.shape[1] == 0 \ and not target_preferences.shape[1] == 0 else np.array([[np.nan]])

...

def get_similarities(self, source_id): return[(other_id, self.get_similarity(source_id, other_id)) for other_id, v in self.model]

from joblib import Memory memory = Memory(cachedir=’’, verbose=0)

http://packages.python.org/joblib/index.html








...



>>> #Without memory.cache









...



>>> #Without memory.cache >>># With memory.cache









...



>>> #Without memory.cache >>># With memory.cache>>> timeit similarity.get_similarities

(‘marcel_caraciolo’)









...




(‘marcel_caraciolo’)>>> timeit similarity.get_similarities

(‘marcel_caraciolo’)









...





(‘marcel_caraciolo’) 100 loops, best of 3: 978 ms per loop









...





(‘marcel_caraciolo’) 100 loops, best of 3: 978 ms per loop 100 loops, best of 3: 434 ms per loop





Distributed Computing with mrJobhttps://github.com/Yelp/mrjob


https://github.com/Yelp/mrjob



It supports Amazon’s Elastic MapReduce(EMR) service, your own Hadoop cluster or local (for testing)





"""The classic MapReduce job: count the frequency of words."""from mrjob.job import MRJobimport re

WORD_RE = re.compile(r"[\w']+")

class MRWordFreqCount(MRJob):

def mapper(self, _, line): for word in WORD_RE.findall(line): yield (word.lower(), 1)

def reducer(self, word, counts): yield (word, sum(counts))

if __name__ == '__main__': MRWordFreqCount.run()

It supports Amazon’s Elastic MapReduce(EMR) service, your own Hadoop cluster or local (for testing)





Elsayed et al: Pairwise Document Similarity in Large Collections with MapReduce




http://www.umiacs.umd.edu/~jimmylin/publications/Elsayed_etal_ACL2008_short.pdf

http://www.umiacs.umd.edu/~jimmylin/publications/Elsayed_etal_ACL2008_short.pdf

Future studies with Sparse MatricesReal datasets come with lots of empty values

Apontador Reviews Dataset

http://aimotion.blogspot.com/2011/05/evaluating-recommender-systems.html

Solutions:

scipy.sparse package

Sharding operations

Matrix Factorization techniques (SVD)







Solutions:


Sharding operations


Crab implements a Matrix Factorization with Expectation

Maximization algorithm







Solutions:


Sharding operations


Crab implements a Matrix Factorization with Expectation

Maximization algorithmscikits.crab.svd package




Benchmarks

Dataset Pure Python w/ dicts

Python w/ Scipy and Numpy

MovieLens 100k 15.32 s 9.56 shttp://www.grouplens.org/node/73

Old Crab New Crab


http://www.grouplens.org/node/73


Benchmarks

Dataset Pure Python w/ dicts

Python w/ Scipy and Numpy

MovieLens 100k 15.32 s 9.56 shttp://www.grouplens.org/node/73

0 4 8 12 16

Time ellapsed ( Recommend 5 items)

Old Crab New Crab




Why migrate ?

Old Crab running only using Pure Python

Recommendations demand heavy maths calculations and lots of processing

Compatible with Numpy and Scipy libraries

High Standard and popular scientific libraries optimized for scientific calculations in Python

Scikits projects are amazing! Active Communities, Scientific Conferences and updated projects (e.g. scikit-learn)

Turn the Crab framework visible for the community Join the scientific researchers and machine learning developers around the Globe coding with

Python to help us in this project

Be Fast and Furious


How are we working ?

Sprints, Online Discussions and Issues

https://github.com/muricoca/crab/wiki/UpcomingEvents




How are we working ?

Our Project’s Home Page

http://muricoca.github.com/crab




Future Releases

Planned Release 0.1Collaborative Filtering Algorithms working, sample datasets to load and test

Planned Release 0.11Evaluation of Recommendation Algorithms and Database Models support

Planned Release 0.12Recommendation as Services with REST APIs

....


Join us!

1. Read our Wiki Pagehttps://github.com/muricoca/crab/wiki/Developer-Resources

2. Check out our current sprints and open issueshttps://github.com/muricoca/crab/issues

3. Forks, Pull Requests mandatory

4. Join us at irc.freenode.net #muricoca or at our discussion list in work :(


https://github.com/muricoca/crab/wiki/Developer-Resources


https://github.com/muricoca/crab/issues?milestone=1&state=open


Recomendação em redes sociais

!"#$%&$'()#% !"#$%*'+,-)%./0#$-+1'/%

%%

!"#$%#$&'()%*&+,-$%.,#/&%

2$,#/3"%456575689%:'+-1'/;%%<#+,=#%

*+>')-$">,?;%%@$-3A-0#3%%

%%

!"--(0".(12%&'()%*&+,-$%.,#/&%

!"#"$%&&'%()*&+,-(.'&/,-0&+,-(.'&%12%&'303#2,&('",'&2,"&34&

%

B#0-%<#+'CC#/3#$%%% %&-$-C#0#$"%%

<#+'CC#/3-1'/"%

Figure 2: Meta Recommender Components Interac-tion

be highly beneficial given that students do not meet phys-ically. It may result in their becoming more socially con-nected, thereby enhancing their social learning environmentand student experience.

3.2 The MethodologyIn order to design our recommender engine for an adaptativelearning environment, we investigated the user behavior in aeducational social network, in our scenario the AtePassar so-cial network and incorporated our own ideas into the systemdesign. We have concluded that the knowledge data sharedbetween students in the learning context are quite di!erent:likeness, text messages, social graphs, etc. For this reason,we have adopted an meta recommender system architecture[14].

A meta recommender approach provides users with cus-tomized control over the generation of a single recommen-dation list, generated from a aggregation of rich data. Thispersonalized control can be implemented analyzing the na-ture of a specific data source. In our approach, for instance,if the item to be recommended is a new friend, where forprivacy concerns, in term of attributes, only the relation-ships (list of friends) are available, then the system reliesmore on collaborative filtering. In other hand, if the item tobe recommended has a rich structured description such as acourse, then the system tends to use more the content-basedapproach. The final recommendation score is computed byaggregating the results of both recommenders. The Figure2 illustrates an overview of our approach.

For new users at the social network, which can su!er fromthe cold start problem, that is a common problem in rec-ommender systems for new users that don’t have historicalrecords in the system, we provide popular recommendationswhich were accepted from another users registered in thenetwork. We hypothesize that popular items recommendedfor the new users would help the users to even interact morewithin the social network, as the system learns from his ini-tial interests.

At this moment, we are still studying di!erent approaches in

Figure 3: AtePassar Recommender System Inter-face

how to compute the final score for the recommendation. Thecurrent approach takes either the final scores of each recom-mender as input to a weighted average function, where theweights can be derived for implicit or explicit user prefer-ences for a certain recommender.

3.3 The Atepassar based Mining and Recom-mender System

We have developed a recommendation framework called Crab,which is a recommender framework that aims to provide arich set of components from which you can build a personal-ized recommender system from a set of algorithms. It is writ-ten in Python, which is a popular programming languagedesigned for scability, flexibility and performance, makinguse of scientific optimized packages in order provide e"cientand easy-to-use solutions in several contexts [15].

We have integrated this engine with easy-to-use interface forstudents into the popular brazilian social network AtePas-sar, an educational virtual learning environment with morethan 70.000 students registered interested at studying forthe public examinations in order to get a civil job. TheFigure 3 presents a screenshot of our recommender systemat AtePassar. Each recommendation comes with a explana-tion, allowing the student to better understand the reason ofthe given suggestion by the system. Another functionalityis that the user can accept or refuse the recommendation,and in this processs the feedback and result obtained can bedirectly applied to future recommendations.

3.4 The Current ResultsIn the current recommender system, we are providing friends,study groups and products (e.g. video classes) to the activeusers at AtePassar. The recommender engine is runningsince January 2011 and recommended more than 100.000items to over than 60.000 users. We are currently devel-oping new features by recommending another componentspresent in the social network such as studyplans, disciplinesand questions.

Integrated this engine with the popular brazilian social network AtéPassar

More than 70.000 students registered studying for the public examinations

Recommend StudyGroups, Friends, Video Classes, Questions and Concursos

More than 70.000 items available for recommend

Written in Python using a open-source framework Crab

Framework available for building recommender systems (My contribution)

It is running since January 2011

In March 2011 , questionnaire was performed.

23%

77%

Liked Not Liked


colecione descontos

WWW.FAVORITOZ.COM


Recomendações Sociais1. Usuário se loga via Facebook2. Usuário acessa a e-commerce parceira da LikeStore.3. Usuário já recebe recomendações personalizadas na entrada.4. Usuário recebe recomendações no carrinho de compras5. Usuário recebe recomendações na página do produto.

Produtos Similares

Quem comprou este também comprou

Amigos que curtiram/ compraram isto


Construção do Social Genoma


Alguém duvida ainda ?

http://www.shopycat.com/Thursday, January 26, 2012

http://www.shopycat.com/

http://www.shopycat.com/

Dicas


Join us!

1. Read our Wiki Pagehttps://github.com/muricoca/crab/wiki/Developer-Resources

2. Check out our current sprints and open issueshttps://github.com/muricoca/crab/issues

3. Forks, Pull Requests mandatory

4. Join us at irc.freenode.net #muricoca or at our discussion list in [email protected]








Dicas para Arquitetura de Recomendação


Items Recomendados

SatnamAlag, Collective Intelligence in Action, Manning Publications, 2009

Toby Segaran, Programming Collective Intelligence, O'Reilly, 2007

Sites como TechCrunch e ReadWriteWeb


Conferências Recomendadas- ACM RecSys.

–ICWSM: Weblogand Social Media

–WebKDD: Web Knowledge Discovery and Data Mining

–WWW: The original WWW conference

–SIGIR: Information Retrieval

–ACM KDD: Knowledge Discovery and Data Mining

–ICML: Machine Learning


Obrigado !!

Fonte: Hunch.com

Onde você estará em tudo isso ?

HUNCH Vendida ao Ebay por $80M


Sistemas de Recomendação

Marcel Pinheiro [email protected]

@marcelcaraciolo

http://www.orygens.com








Optimizations with Cythonhttp://cython.org/

Cython is a Python extension that lets developers annotate functions so they can be compiled to C.

http://aimotion.blogspot.com/2011/09/high-performance-computation-with_17.html


http://cython.org/

http://cython.org/



Optimizations with Cythonhttp://cython.org/

Cython is a Python extension that lets developers annotate functions so they can be compiled to C.


# setup.py

from distutils.core import setup

from distutils.extension import Extension

from Cython.Distutils import build_ext

# for notes on compiler flags see:

# http://docs.python.org/install/index.html

setup(

cmdclass = {'build_ext': build_ext},

ext_modules = [Extension("spearman_correlation_cython", ["spearman_correlation_cython.pyx"])]

)


http://cython.org/

http://cython.org/



Cache/Paralelism with joblibhttp://packages.python.org/joblib/index.html

Investigate how to use multiprocessing and parallel packages with similarities computation

def get_similarities(self, source_id): return Parallel(n_jobs=3) ((other_id, delayed(self.get_similarity) (source_id, other_id)) for other_id, v in self.model)

from joblib import Parallel ...




Sistemas de Recomendação: Como funciona e Onde Se aplica?

Technology

Transcript of Sistemas de Recomendação: Como funciona e Onde Se aplica?