- 1. Hpricot Extraindo dados de pginas web por Jonas Alves
2. Jonas Alves
- Rubista desde 2008 WebGoal desde 2009 @jonas_alves
http://github.com/jonasfa http://br.linkedin.com/in/alvesjonas
3. Cenrio ? 4. Cenrio
- 10+ pessoas coletando dados manualmente
5. Erros comprometem a qualidade do servio 6. Muito trabalho ==
hora extra == $$ 7. Automatizar Proposta 8. Ferramentas
Java: HTMLParser
Ruby: HPricot
9. Comparao
- Hpricot (Ruby) doc =
Hpricot(open('http://www.ruby-lang.org/en/about/')) puts
(doc/'#content h3').collect { |h3| h3.inner_text }
10. Comparao
- HTMLParser (Java) CssSelectorNodeFilter cssSelector = new
CssSelectorNodeFilter("#content h3"); FilterBean bean = new
FilterBean(); bean.setFilters(new NodeFilter[] {cssSelector});
bean.setURL(" http://www.ruby-lang.org/en/about/ ");
SimpleNodeIterator iterator = bean.getNodes().elements(); while
(iterator.hasMoreNodes()) {
System.out.println(iterator.nextNode().toPlainTextString()); }
11. Let's code! 12. http://github.com/jonasfa/hpricot_gurusp
GitHub 13. Referncias
14. http://github.com/hpricot/hpricot 15.
http://wiki.github.com/hpricot/hpricot/ 16. Agradecimentos
17. Anderson Leite, Caelum e organizao 18. WebGoal