Fazendo mágica com ElasticSearch
-
Upload
pedro-franceschi -
Category
Technology
-
view
1.329 -
download
3
description
Transcript of Fazendo mágica com ElasticSearch
PEDROFRANCESCHI @pedroh96
[email protected] github.com/pedrofranceschi
Fazendo mágica com ElasticSearch
Outubro/2010
Filters
Full text search
Sort
Highlight
Facets
Pagination
Você vai precisar buscar dados.
Você vai precisar entender dados.
(My)SQL não é a solução.
(… nem NoSQL)
O que é o ElasticSearch?
ElasticSearch
• “Open Source Distributed Real Time Search & Analytics”
• API RESTful para indexar/buscar JSONs (“NoSQL”)
• NÃO é um banco de dados
• Apache Lucene
• Just works (and scales)
• Full text search, aggregations, scripting, etc, etc, etc.
MySQL ElasticSearch
Database Index
Table Type
Row Document
Column Field
Schema Mapping
Partition Shard
Nomes?
Como usar o ElasticSearch?
$ curl -XPUT 'http://localhost:9200/twitter/tweet/1' -d '{! "user" : “pedroh96",! "post_date" : "2009-11-15T14:12:12",! "message" : "trying out Elasticsearch"!}'
Endpoint Index TypeDocument
ID
Document
{! "_index" : "twitter",! "_type" : "tweet",! "_id" : "1",! "_version" : 1,! "created" : true!}
PUT data
$ curl -XGET 'http://localhost:9200/twitter/tweet/1'
Endpoint Index TypeDocument
ID
{! "_id": "1",! "_index": "twitter",! "_source": {! "message": "trying out Elasticsearch",! "post_date": "2009-11-15T14:12:12",! "user": "pedroh96"! },! "_type": "tweet",! "_version": 1,! "found": true!}
Document
GET data
$ curl -XGET 'http://localhost:9200/twitter/_search'!-d ‘{ query: . . . }!!!!!!!!!!
Endpoint IndexGET data
Query de busca
Operador de busca
ActiveRecords
class Tweet < ActiveRecord::Base!end
ActiveRecords
require 'elasticsearch/model'!!class Tweet < ActiveRecord::Base! include Elasticsearch::Model! include Elasticsearch::Model::Callbacks!end!!
Tweet.import
Tweet.search(“pedroh96”)
Por que usar o ElasticSearch?
DISCLAIMER
Post.where(:all, :author => "pedroh96")
vs
Post.search(query: { match: { author: "pedroh96" }})
Just Another Query Language?
1) Full text search
ActiveRecords
$ rails g scaffold Post title:string! source:string
Post.find(5)
GET /posts/5
:-)
ActiveRecords
Post.where(:all, :title => "Amazon to Buy Video Site Twitch for More Than $1B")
“Amazon to Buy Video Site Twitch for More Than $1B”
ActiveRecords
:-)
Post.where(["title LIKE ?", "%Amazon%"])
“amazon”
???
ActiveRecords
Post.where(["title LIKE ? AND source = ?", "%Amazon%", "online.wsj.com"])
“amazon source:online.wsj.com”
??????
ActiveRecords
Post.search("amazon")
“amazon”
:-)
ElasticSearch
“amazon source:online.wsj.com”
ElasticSearch
search = Post.search("amazon source:online.wsj.com")
:-)
“amazon source:online.wsj.com”
ElasticSearch
search = Post.search( query:{ match: { _all: "amazon source:online.wsj.com", } })
Full-text search
“amazon source:online.wsj.com”
ElasticSearch
search = Post.search( query:{ multi_match: { query: "amazon source:online.wsj.com", fields: ['title^10', 'source'] } })
Full-text search
Title boost
“amazon source:online.wsj.com”
ElasticSearch
search = Post.search( query:{ multi_match: { query: "amazon source:online.wsj.com", fields: ['title^10', 'source'] } }, highlight: { fields: { title: {} } })
Title highlight
Full-text search
Title boost
ElasticSearch
> search.results[0].highlight.title => ["Twitch officially acquired by <em>Amazon</em>"]
Title highlight
2) Aggregations (faceting)
Geo distance aggregation
ActiveRecords
$ rails g scaffold Coordinate latitude:decimal longitude:decimal
ActiveRecords
class Coordinate < ActiveRecord::Base!end
ActiveRecords
class Coordinate < ActiveRecord::Base! def distance_to(coordinate)! # From http://en.wikipedia.org/wiki/Haversine_formula! rad_per_deg = Math::PI/180 # PI / 180! rkm = 6371 # Earth radius in kilometers! rm = rkm * 1000 # Radius in meters!! dlon_rad = (coordinate.longitude.to_f - self.longitude.to_f) * rad_per_deg # Delta, converted to rad! dlat_rad = (coordinate.latitude.to_f - self.latitude.to_f) * rad_per_deg!! lat1_rad = coordinate.latitude.to_f * rad_per_deg! lat2_rad = self.latitude.to_f * rad_per_deg! lon1_rad = coordinate.longitude.to_f * rad_per_deg! lon2_rad = self.longitude.to_f * rad_per_deg!! a = Math.sin(dlat_rad/2)**2 + Math.cos(lat1_rad) * Math.cos(lat2_rad) * Math.sin(dlon_rad/2)**2! c = 2 * Math::atan2(Math::sqrt(a), Math::sqrt(1-a))!! rm * c # Delta in meters! end!end
> c1 = Coordinate.new(:latitude => -23.5532636, :longitude => -46.6528908) > c2 = Coordinate.new(:latitude => -23.5538488, :longitude => -46.6530035) > c1.distance_to(c2) => 66.07749735875552
ActiveRecordsorigin = Coordinate.new(:latitude => -23.5532636, :longitude => -46.6528908)
Coordinate.all.each do |coordinate|! distance = origin.distance_to(coordinate)!! buckets.each do |bucket|! if distance < bucket[:to] and distance > (bucket[:from] || 0)! bucket[:coordinates] << coordinate! end! end!end
buckets = [! {! :to => 100,! :coordinates => []! },! {! :from => 100,! :to => 300,! :coordinates => []! },! {! :from => 300,! :coordinates => []! }!]!
??????
ElasticSearch
query = {! aggregations: {! rings_around_rubyconf: {! geo_distance: {! field: "location",! origin: "-23.5532636, -46.6528908",! ranges: [! { to: 100 },! { from: 100, to: 300 },! { from: 300 }! ]! }! }! }!}
:-)search = Coordinate.search(query)
Nome da aggregation
Field com localização
Buckets para agregar
Coordenadas da origem
Tipo da aggregation
(Extended) stats aggregation
ActiveRecords
$ rails g scaffold Grade subject:string grade:decimal
ElasticSearch
query = {! aggregations: {! grades_stats: {! extended_stats: {! field: "grade",! }! }! }!}!!search = Grade.search(query)
Nome da aggregation
Nome do field
Tipo da aggregation
ElasticSearch
> search.response.aggregations.grades_stats!!=> #<Hashie::Mash avg=8.03 count=3 max=10.0 min=4.6 std_deviation=2.43 sum=24.1 sum_of_squares=211.41 variance=5.93>>
(Extended) stats aggregation +
Scripting
ElasticSearch
query = {! aggregations: {! grades_stats: {! extended_stats: {! field: "grade",! }! }! }!}
ElasticSearch
query = {! aggregations: {! grades_stats: {! extended_stats: {! field: "grade",! script: "_value < 7.0 ? _value * correction : _value",! params: {! correction: 1.2! }! }! }! }!}!!search = Grade.search(query)
Nome da aggregation
Nome do fieldJavaScript para
calcular novo grade
Tipo da aggregation
ElasticSearch
> search.response.aggregations.grades_stats!!=> #<Hashie::Mash avg=8.34 count=3 max=10.0 min=5.52 std_deviation=2.00 sum=25.02 sum_of_squares=220.72 variance=4.01>>
Term aggregation
ElasticSearch
query = {! aggregations: {! subjects: {! terms: {! field: "subject"! }! }! }!}!!search = Grade.search(query)
Nome da aggregation
Nome do field
Tipo da aggregation
ElasticSearch
> search.response.aggregations.subjects!!=> #<Hashie::Mash buckets=[!#<Hashie::Mash doc_count=2 key=“math">,!#<Hashie::Mash doc_count=1 key="grammar">, #<Hashie::Mash doc_count=1 key=“physics">!]>
Combined aggregations (term + stats)
ElasticSearch
query = {! aggregations: {! subjects: {! terms: {! field: "subject"! }! }! }!}!!search = Grade.search(query)
ElasticSearch
query = {! aggregations: {! subjects: {! terms: {! field: "subject"! },! aggregations: {! grade_stats: {! stats: {! field: "grade"! }! }! }! }! }!}!!search = Grade.search(query)
Nome da parent aggregation
Field para parent aggregation
Field para child aggregation
Nome da child aggregation
ElasticSearch
> search.response.aggregations.subjects!!#<Hashie::Mash buckets=[!#<Hashie::Mash doc_count=2 grade_stats=#<Hashie::Mash avg=9.0 count=2 max=10.0 min=8.0 sum=18.0> key="math">, #<Hashie::Mash doc_count=1 grade_stats=#<Hashie::Mash avg=4.6 count=1 max=4.6 min=4.6 sum=4.6> key="grammar">, #<Hashie::Mash doc_count=1 grade_stats=#<Hashie::Mash avg=9.5 count=1 max=9.5 min=9.5 sum=9.5> key=“physics">!]>
Top Hits
More like this
Histogram
Scripted metrics
Geo bounds
Stemmer (sinônimos)
IPv4 ranges
. . .
3) Scoring
ActiveRecords
$ rails g scaffold Post title:string! source:string likes:integer
“amazon”ElasticSearch
search = Post.search( query: { match: { _all: "amazon", } })
Full-text search
search.results.results[0]._score => 0.8174651
“amazon”ElasticSearch
search = Post.search( query: { custom_score: { query:{ match: { _all: "amazon", } }, script: "_score * doc['likes'].value" } })
Full-text search
Likes influenciam no score
search.results.results[0]._score => 31.8811388
GET http://localhost:9200/post/_search?explain"_explanation": {! "description": "weight(tweet:honeymoon in 0)! [PerFieldSimilarity], result of:",! "value": 0.076713204,! "details": [! {! "description": "fieldWeight in 0, product of:",! "value": 0.076713204,! "details": [! {! "description": "tf(freq=1.0), with freq of:",! "value": 1,! "details": [! {! "description": "termFreq=1.0",! "value": 1! }! ]! },! {! "description": "idf(docFreq=1, maxDocs=1)",! "value": 0.30685282! },! {! "description": "fieldNorm(doc=0)",! "value": 0.25,! }! ]! }! ]!}
Score explicado
4) Indexando responses
$ rails g scaffold Post title:string! source:string likes:integer
class PostsController < ApplicationController!! # ...!! def show! @post = Post.find(params[:id])!! render json: @post! end!! # ...!!end
SELECT * FROM Posts WHERE id = params[:id]
class PostsController < ApplicationController!! # ...!! def show! @post = Post.search(query: { match: { id: params[:id] }})!! render json: @post! end!! # ...!!end
GET http://localhost:9200/posts/posts/params[:id]
ActiveRecords
require 'elasticsearch/model'!!class Post < ActiveRecord::Base! include Elasticsearch::Model! include Elasticsearch::Model::Callbacks!! belongs_to :author!! def as_indexed_json(options={})! self.as_json(! include: { author: { only: [:name, :bio] },! })! end!end Inclui um parent no JSON indexado
Expondo o ElasticSearch
http://localhost:9200/pagarme/_search
https://api.pagar.me/1/search
Infraestrutura do Pagar.me
Router
api.pagar.me
Servidor da API (Node.js)
ElasticSearchElasticSearch
MySQL (transações e dados relacionais)
MySQL (transações e dados relacionais)
MongoDB (dados de clientes e não relacionais)
Ambiente de testes (sandbox dos clientes)
Ambiente de produção
Servidor da API (Node.js)
Expondo o ElasticSearch
• Endpoint do ElasticSearch -> Endpoint acessado pelo cliente…
• … mas cuidado: dados precisam ser delimitados a conta do cliente (claro)
• Vantagem: acesso às mesmas features do ElasticSearch (aggregations, statistics, scores, etc)
• Segurança: desabilitar scripts do ElasticSearch
GET /search
• Um único endpoint para todos os GETs
• Todos os dados indexados e prontos para serem usados (no joins)
• Queries complexas construídas no front-side (Angular.js)
• Desenvolvimento front-end não dependente do back-end
Overall…
1)Há uma ferramenta para cada tarefa.
2)Um martelo é sempre a ferramenta certa.
3)Toda ferramenta também é um martelo.
MySQL
!=
NoSQL
!=
ElasticSearch
PEDROFRANCESCHI @pedroh96
[email protected] github.com/pedrofranceschi
Fazendo mágica com ElasticSearch