Tese phd

Post on 18-Dec-2014

476 views 5 download

description

My PhD thesis presentation

Transcript of Tese phd

Organization is Sharing:From eScience to

Personal Information Management

Rodrigo Dias Arruda Senra

Advisor: Profa Dra. Claudia Bauzer Medeiros

Defesa de Tese de Doutorado em Ciência da Computação Universidade Estadual de Campinas

Instituto de Computação

Campinas 2012-12-10

Outline

• Motivation

• Objectives

• Contributions

• Results

2

• SciFrame

• Database Descriptors

• Organographs{

Motivation

4

Study the relation Heterogeneity ↔ Organization ↔ Sharing

5

NDVI Profile Generation

PostGIS

Filesystem

Postgres

WebMAPS

5

NDVI Profile Generation

Geometries (IBGE)

Spectral Images(NASA)

Crops(Min.Agr)

PostGIS

Filesystem

Postgres

HTTPFTP

WebMAPS

5

NDVI Profile Generation

Geometries (IBGE)

Spectral Images(NASA)

Crops(Min.Agr)

PostGIS

Filesystem

Postgres

HTTPFTP

WebMAPS

5

NDVI Profile Generation

Geometries (IBGE)

Spectral Images(NASA)

Crops(Min.Agr)

PostGIS

Filesystem

Postgres

HTTPFTP

WebMAPS

5

NDVI Profile Generation

Geometries (IBGE)

Spectral Images(NASA)

Crops(Min.Agr)

PostGIS

Filesystem

Postgres

HTTPFTP

WebMAPS

5

NDVI Profile Generation

Geometries (IBGE)

Spectral Images(NASA)

Crops(Min.Agr)

PostGIS

Filesystem

Postgres

HTTPFTP

WebMAPS

5

NDVI Profile Generation

Geometries (IBGE)

Spectral Images(NASA)

Crops(Min.Agr)

PostGIS

Filesystem

Postgres

HTML, Microformats, 2D Plots

HTTPFTP

HTTP

WebMAPS

Objectives

8

• describe and compare eScience systems

• match Applications needs with DBMS capabilities

• manage digital content hierarchies

8

Motivation

Objectives

• Contributions

• Results

9

• SciFrame

• Database Descriptors

• Organographs{

SciFrame

11

SciFrame

The Scientific Digital Data Processing Framework is a conceptual framework that describes systems or

processes involving digital data manipulation.

Interfacing

Acquisition

Publication

(discovery - extraction - transference )

Information Management Data Management

SciFrameInterfacing

Acquisition

Publication

(discovery - extraction - transference )

Information Management Data Management

SciFrameInterfacing

Acquisition

Publication

(discovery - extraction - transference )

Information Management Data Management

SciFrameInterfacing

Acquisition

Publication

(discovery - extraction - transference )

Information Management Data Management

SciFrameInterfacing

Acquisition

Publication

(discovery - extraction - transference )

Information Management Data Management

Data Management

Manipulation

Create Retrieve Update Delete Index

Storage

SciFrameInterfacing

Acquisition

Publication

(discovery - extraction - transference )

Information Management Data Management

Data Management

Manipulation

Create Retrieve Update Delete Index

Storage

Information Management

SciFrameInterfacing

Acquisition

Publication

(discovery - extraction - transference )

Information Management Data Management

Information Management

SciFrameInterfacing

Acquisition

Discovery

Extraction

Transference

Publication

Data Management

Storage

Manipulation

Information Management

Description

TransformationFusing

Filtering

WebMapsInterfacing

Acquisition

Discovery Geometries (IBGE), Raster(NASA), Crops(Min.Agr)

Extraction ad hoc extractor scripts (paparazzi)

Transference FTP and HTTP

Publication HTML, Microformats, 2D Plots

Data Management

Storage Geometries(PostGIS), Raster(Files), Crops(Postgres)

Manipulation Geometries(CRDI), Raster(CRD), Crops(CRUDI)

Information Management

Description Geometries(SHP,WKT), Raster(HDF,GeoTIFF)

TransformationFusing NDVI Time Series

Filtering Cloud and noise removal (HANTS)

Research ProblemsInterfacing

Acquisition

Discovery data scattered, many providers, search engines ?

Extraction feasibility, preserve provenance, lack of semantics

Transference availability, voluminous data, bandwidth, protocol

Publication lack of intention, access control, traceability

Data Management

Storage scalability, distribution, consistency, preservation

Manipulation multimedia, impedance mismatch

Information Management

Description implicit x explicit, semantic web, social, trust, privacy

Transformationinformation lost: conceptual > logical > physical

multi-modalityhandle uncertain and incomplete data

TechnologiesInterfacing

Acquisition

Discovery DAS Registry, BIOCatalogue, SciScope

Extraction Scrappers, Wrappers, PiggyBank, Operator

Transference Streaming, P2P, OpenDAP

Publication SOA x ROA, Microformats x RDFa

Data Management

Storage Scientific Datasets, XML, Cloud Computing

Manipulation SQL extensions, ORMs, LINQ

Information Management

Description In Loco Semantics

TransformationArray Algebra (RASDAMAN)Topological Operators (GIS)

Proximity Search and Report Language (ISIS)

Interfacing

Acquisition

Publication

(discovery - extraction - transference )

Information ManagementData Management

Data Management

Data Management

Data Management

✓enforce loose coupling between Apps and DBMS

✓DBMS product/vendor independence

✓seamless cross-database migration

✓capability verification, validation and negotiation

✓support Apps and DBMS in the cloud!

Database Descriptors

DBMS

Descriptors

Feature descriptor

Desiderata descriptorspecifies what a client application needs

12

App

DBMS

Descriptors

Feature descriptor

Desiderata descriptorspecifies what a client application needs

specifies what a DBMS provides12

App

Architecture

15

WebDMS X

DMS YDMS Z

Architecture

15

WebDMS X

DMS YDMS Z

DescriptorRegistry

descriptor X

descriptor Y

Architecture

15

WebDMS X

DMS YDMS Z

DescriptorRegistry

DescriptorRegistryDescriptor

RegistryDescriptorRegistry

descriptor X

descriptor Y

Architecture

15

WebDMS X

DMS YDMS Z

DescriptorRegistry

DescriptorRegistryDescriptor

RegistryDescriptorRegistry

App

descriptor X

descriptor Y

Architecture

15

WebDMS X

DMS YDMS Z

DescriptorRegistry

Negotiator

DescriptorRegistryDescriptor

RegistryDescriptorRegistry

App

descriptor X

descriptor Y

Architecture

15

WebDMS X

DMS YDMS Z

DescriptorRegistry

Negotiator

DescriptorRegistryDescriptor

RegistryDescriptorRegistry

App

descriptor X

descriptor Y

Architecture

15

WebDMS X

DMS YDMS Z

DescriptorRegistry

Negotiator

DescriptorRegistryDescriptor

RegistryDescriptorRegistry

App

descriptor X

descriptor Y

binding

DBD Structure

13 * http://dublincore.org/documents/dces/

App DBMS

@prefix : <http://www.lis.ic.unicamp.br/purl/DBD/> .@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .@prefix dc: <http://purl.org/dc/elements/1.1/> .@prefix foaf: <http://xmlns.com/foaf/0.1/> .

:Cmbm a foaf:Person ; foaf:name “Claudia Bauzer Medeiros” .

:DBD1 dc:identifier “DBD1” ; dc:type “Feature DBD” ; dc:format “text/turtle” ; dc:title “Sample Feature Descriptor” ; dc:description “Hypothetical Feature DBD in RDF/Turtle” ; dc:creator :Cmbm ; dc:date “2009-12-18” ; dc:language “EN” ; :isolation :READ_COMMITED ; :versioning “unsupported” ; :storage “RDF Triples” ; :DML [ a rdf:Bag ; rdf:_1 RDQL ; rdf:_2 SPARQL ; ] .

Feature Descriptor

@prefix : <http://www.lis.ic.unicamp.br/purl/DBD/> .@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .@prefix dc: <http://purl.org/dc/elements/1.1/> .@prefix foaf: <http://xmlns.com/foaf/0.1/> .

:Rodsenra a foaf:Person ; foaf:name “Rodrigo Dias Arruda Senra” .

:DBD2 dc:identifier “DBD2” ; dc:type “Desiderata DBD” ; dc:format “text/turtle” ; dc:title “Sample Desiderata Descriptor” ; dc:description “Desiderata DBD for hypothetical App” ; dc:creator :Rodsenra; dc:date “2010-01-05” ; dc:language “EN” ; :isolation :READ_COMMITED ; :concurrency “Two phase lock” ; :storage “RDF Triples” ; :DML SPARQL .

Desiderata Descriptor

Understanding Hierarchies...

SciFrame DBDs

Organographs

27

28

Which of the following sets better accommodate the object above ?

29

Red ? Triangles ? Metric Related ?

Problems

30

1. Single Category versus Multi-faceted Content

2. Manually-defined categories

3.Criteria is not explicit

4.Static Membership Relation

5. Organization is not reusable

31

31

Organograph

... artifact to make explicit how to organize information in the context of a particular task.

Organograph

32

Hout = forg(Hin)

vcnt

eagg

ecnt

H(V,E)

vagg

vagg

Organograph

32

Hout = forg(Hin)

forg:• navigation (crawler/iterador)

• feature extraction

• FHil(vagg,vagg): hierarchical structuring

• FCat(vagg,vcnt): categorization

URL

HoutHin

URL

vcnt

eagg

ecnt

H(V,E)

vagg

vagg

NLP

Author

MLContentDomain

Expert Roles

OntologiesClassifiersInformation

Extraction

Algorithms

Similarityforg

Vizualization Strategies

33

Iterators

Data Container UX

Organograph Composition

Task !

NLP

Author

MLContentDomain

Expert Roles

OntologiesClassifiersInformation

Extraction

Algorithms

Similarityforg

Vizualization Strategies

33

Iterators

Data Container UX

Organograph Composition

Task !

• patterns• dictionaries• rules• probabilities• templates/wrappers

NLP

Author

MLContentDomain

Expert Roles

OntologiesClassifiersInformation

Extraction

Algorithms

Similarityforg

Vizualization Strategies

33

Iterators

Data Container UX

Organograph Composition

Task !

• matching• dice• jaccard• overlap• cosine

NLP

Author

MLContentDomain

Expert Roles

OntologiesClassifiersInformation

Extraction

Algorithms

Similarityforg

Vizualization Strategies

33

Iterators

Data Container UX

Organograph Composition

Task !

• FOAF• Dbpedia• Schema.org• Freebase• MusicBrainz• Geonames

NLP

Author

MLContentDomain

Expert Roles

OntologiesClassifiersInformation

Extraction

Algorithms

Similarityforg

Vizualization Strategies

33

Iterators

Data Container UX

Organograph Composition

Task !

• Naive Bayes• SVM• Nearest Neighbors• LDA• LSI

NLP

Author

MLContentDomain

Expert Roles

OntologiesClassifiersInformation

Extraction

Algorithms

Similarityforg

Vizualization Strategies

33

Iterators

Data Container UX

Organograph Composition

Task !

• Filesystem• Gmail• Evernote• Delicious• Dropbox

DBDs!

NLP

Author

MLContentDomain

Expert Roles

OntologiesClassifiersInformation

Extraction

Algorithms

Similarityforg

Vizualization Strategies

33

Iterators

Data Container UX

Organograph Composition

Task !

• Fuse, Dokan• Infoviz• D3

Metodology

34

collection

Metodology

34

collection

organize

Metodology

34

collection

organize

evaluate

Metodology

34

collection

organize

evaluate

reorganize

Metodology

34

collection

organize

evaluate

reorganize

share

Evaluating Hierarchies

35

Evaluating Hierarchies

35

too much content

Evaluating Hierarchies

35

too much content

duplicated or misplaced

Evaluating Hierarchies

35

too much content

too manyaggregators

duplicated or misplaced

Evaluating Hierarchies

35

too much content

too manyaggregators

duplicated or misplaced

too deep

Reorganizing Hierarchies

36

Alice

Bob

2011

2008

2011

Author

Publication Date

paper 1

paper 2

paper 3

Reorganizing Hierarchies

36

Alice

Bob

2011

2008

2011

Author

Publication Date Author

Publication Date

paper 1

paper 2

paper 3

Reorganizing Hierarchies

36

Alice

Bob

2011

2008

2011 Alice

Bob

2008

2011

Alice

Author

Publication Date Author

Publication Date

Task is important!

paper 1

paper 2

paper 3

Reuse Organization

37

Reuse Organization

37

Reuse Organization

37

Hacm Vcntmine

Hin

Hout

Internal Indexes

Pre-processing

Feature Extraction

Transformation Workflow

Organograph Execution

FCat() FHil()

Visualization

Hin

Hout

Internal Indexes

Pre-processing

Feature Extraction

Transformation Workflow

Organograph Execution

FCat() FHil()

Visualization

Hin

Hout

Internal Indexes

Pre-processing

Feature Extraction

Transformation Workflow

Organograph Execution

FCat() FHil()

Visualization

Hin

Hout

Internal Indexes

Pre-processing

Feature Extraction

Transformation Workflow

Organograph Execution

FCat() FHil()

Visualization

@organographdef forg_ccs98(self, input): self.id = new_uuid() #‘ff7d8e21-4226-11e2-b2f1-109add6b426c’ self.description = ‘docs by ACM CCS98’ ccs98 = acm_extract(‘http://www.acm.org/about/class/1998/ccs98.xml’) trainset = [] for category,words in nlp_clean_titles(ccs98.Vcnt.paths): for w in words: trainset.append((make_feature(w), category))

classifier = NaiveBayes(trainset) self.Ecnt = classifier.classify(input) # FCat self.Eagg = ccs98.Eagg.Level[:1] # FHil

@organographdef forg_ccs98(self, input): self.id = new_uuid() #‘ff7d8e21-4226-11e2-b2f1-109add6b426c’ self.description = ‘docs by ACM CCS98’ ccs98 = acm_extract(‘http://www.acm.org/about/class/1998/ccs98.xml’) trainset = [] for category,words in nlp_clean_titles(ccs98.Vcnt.paths): for w in words: trainset.append((make_feature(w), category))

classifier = NaiveBayes(trainset) self.Ecnt = classifier.classify(input) # FCat self.Eagg = ccs98.Eagg.Level[:1] # FHil

input = collection(‘file:///some/local/dir/docs’)output = forg_ccs98(input)publish(output, ‘rodsenra@dropbox:/output’)organicer.render(output, organicer.views.HYPERBOLIC_TREE)

forg_ccs_98Interfacing

Acquisition

Discovery ACM CCS98, Hin

Extraction pdf2txt,pdfbox, pypdf; NLTK (tokenizer)

Transference HTTP, WebDAV, NFS, SMB

Publication Hout :HTML+CSS, JS(Infoviz,D3); Dropbox

Data Management

Storage NoSQL DB (Mongo, Neo4J)

Manipulation Indexes (CRDI)

Information Management

Description SKOS, GraphML, JSON

TransformationMining NaiveBayes

Filtering Vcnt(unconverted pdfs); Vagg (empty or ambiguous)

Related Work

Related Work (SciFrame)

• CLRC scientific metadata modelB. Matthews and S. SufiThe CLRC Scientific Metadata Model, version 1, DL TR 02001, CLRC2001

• myGrid Information ModelSharman, Nick, et al. "The myGrid information model." UK e-Science programme All Hands Conference. 2004.

Related Work (DBDs)

Madnick and Wang.Evolution Towards Strategic Applications Of Databases Through Composite Information Systems.Journal of Management Information Systems 5(2):5-22 1988

“In order to: separate data from the application processing, it is necessary to employ a process descriptor and a database descriptor.

The process descriptor describes the name, the input/output data requirement, and other resource requirements of the processing components.

The database descriptor contains information about the data (e.g., data model, schema, access rights) in the database, similar to data dictionaries.

These two descriptors can be used by the execution environment to coordinate the interaction between the processing component and the database.”

Related Work (Organographs)

• Topic Modeling LSA, LDA, Hierarchical Bayesian

Blei 201; Blei, Ng, & Jordan, 2003; Griffiths & Steyvers, 2002; 2003; 2004; Hofmann, 1999; 2001

• Personal Information Management CALO, UMEA, X-COSIM, Haystack, UpLib, Iris

Zimmermann 2005; Arndt 2007; Lansdale 1988; Kaptelinin 2003; Janssen & Popat 2003; Karger et al 2003

• Semantic DesktopNepomuk, SEMSOCGiannakidou et al 2008; Groza et al 2007

• Personal Digital LibrariesZotero, Mendeley, Papers

Results

Contributions

• SciFrame

• Database Descriptors (DBDs)

• Organographs

• Software tools & algorithms: WebMAPS, Paparazzi & Organicer

46

Publications

submitted to JODS

Evaluating, Reorganizing and Sharing Digital Information Hierarchies.Rodrigo D. A. Senra, Claudia B. Medeiros. Journal on Data Semantics (submetido em 2012-10-25)

2011Organographs - Multi-faceted Hierarchical Categorization of Web Documents. Rodrigo D. A. Senra, Claudia B. Medeiros. Proceeding of the 7th International Conference on Web Information Systems and Technologies - WEBIST: 583-588

2010Database Descriptors: Laying the Path to Commodity Web Data Services.Rodrigo D. A. Senra, Claudia B. Medeiros. Proceedings of Engineering of Computer-Based Systems (ECBS): 386-392

2009SciFrame: a conceptual framework to describe data sharing in eScience.Rodrigo D. A. Senra, Claudia B. Medeiros. Proceedings of the III Brazilian eScience workshop (XXIV SBBD)

2009A standards-based framework to foster geospatial data and process interoperability. Gilberto Z. Pastorello Jr., Rodrigo D. A. Senra, Claudia B. Medeiros. Journal of the Brazilian Computer Society 15(1): 13-25

2008Bridging the gap between geospatial resource providers and model developers.Gilberto Z. Pastorello Jr., Rodrigo D. A. Senra, Claudia B. Medeiros. Proceedings of the 16th International Conference on Advances in Geographic Information Systems - ACM SIGSPATIAL

2007O projeto WebMAPS: desafios e resultados. Carla G. N. Macário, Claudia B. Medeiros, Rodrigo D. A. Senra. Proceedings of 9th Brazilian Symposium on Geoinformatics - GeoInfo: 239-250

47

Publications

submitted to JODS

Evaluating, Reorganizing and Sharing Digital Information Hierarchies.Rodrigo D. A. Senra, Claudia B. Medeiros. Journal on Data Semantics (submetido em 2012-10-25)

2011Organographs - Multi-faceted Hierarchical Categorization of Web Documents. Rodrigo D. A. Senra, Claudia B. Medeiros. Proceeding of the 7th International Conference on Web Information Systems and Technologies - WEBIST: 583-588

2010Database Descriptors: Laying the Path to Commodity Web Data Services.Rodrigo D. A. Senra, Claudia B. Medeiros. Proceedings of Engineering of Computer-Based Systems (ECBS): 386-392

2009SciFrame: a conceptual framework to describe data sharing in eScience.Rodrigo D. A. Senra, Claudia B. Medeiros. Proceedings of the III Brazilian eScience workshop (XXIV SBBD)

2009A standards-based framework to foster geospatial data and process interoperability. Gilberto Z. Pastorello Jr., Rodrigo D. A. Senra, Claudia B. Medeiros. Journal of the Brazilian Computer Society 15(1): 13-25

2008Bridging the gap between geospatial resource providers and model developers.Gilberto Z. Pastorello Jr., Rodrigo D. A. Senra, Claudia B. Medeiros. Proceedings of the 16th International Conference on Advances in Geographic Information Systems - ACM SIGSPATIAL

2007O projeto WebMAPS: desafios e resultados. Carla G. N. Macário, Claudia B. Medeiros, Rodrigo D. A. Senra. Proceedings of 9th Brazilian Symposium on Geoinformatics - GeoInfo: 239-250

47

SciFrame

WebMaps

DBDs

Organographs

Extensions

Theoretical Practical

SciFrame • formalize design pattern• enhance the operations vocabulary

• online catalog of eScience systems• describe as ontology (RDF)

DatabaseDescriptors

• analyse negotiation frameworks• expand DBDs expressivity• explore ranking algorithms

• catalog of concrete DBDs• adapt Organicer to use DBDs• experiment with dynamic negotiation

Organographs • model with Category Theory• explore DSLs to describe forg

• support non-textual media (eg.:img)• expand component palette

48

Agradecimentos

• Laboratório de Sistemas de Informação (IC-Unicamp)

http://www.lis.ic.unicamp.br• Brazilian Institute for Web Science Research

http://webscience.org.br• Fapesp - CNPQ - CAPES

49

Rodrigo Dias Arruda Senrahttp://rodrigo.senra.nom.br

rsenra@acm.org

Rodrigo Dias Arruda Senrahttp://rodrigo.senra.nom.br

rsenra@acm.org

Thank you.Agradeço sua atenção.

Support Material

Hierarquia de Origem

Hierarquia de Origem

Pre-processamento

BeautifulSouppyPdf

Hierarquia de Origem

ExtraçãoNLTK

Pre-processamento

BeautifulSouppyPdf

Hierarquia de Origem

ExtraçãoNLTK

Pre-processamento

BeautifulSouppyPdf

Índice deFacetas

pymongo

Hierarquia de Origem

Workflow de Transformação

ExtraçãoNLTK

Pre-processamento

BeautifulSouppyPdf

Índice deFacetas

pymongo

Hierarquia de Origem

Workflow de Transformação

ExtraçãoNLTK

Pre-processamento

BeautifulSouppyPdf

Índice deFacetas

pymongo

networkx gensimnumpy scikit-learn

Hierarquia de Origem

Workflow de Transformação

HierarquiaResultante

Visualização

ExtraçãoNLTK

Pre-processamento

BeautifulSouppyPdf

Índice deFacetas

pymongo

networkx gensimnumpy scikit-learn

Hierarquia de Origem

Workflow de Transformação

HierarquiaResultante

Visualização

ExtraçãoNLTK

Pre-processamento

BeautifulSouppyPdf

Índice deFacetas

pymongo

networkx gensimnumpy scikit-learn

matplotlibObsPy

InfoViz.jsD3.js

Hierarquia de Origem

Workflow de Transformação

HierarquiaResultante

Visualização

Navegação daHierarquia

Iterador

ExtraçãoNLTK

Pre-processamento

BeautifulSouppyPdf

Índice deFacetas

pymongo

networkx gensimnumpy scikit-learn

matplotlibObsPy

InfoViz.jsD3.js

Hierarquia de Origem

Workflow de Transformação

HierarquiaResultante

Visualização

Navegação daHierarquia

Iterador

ExtraçãoNLTK

Pre-processamento

BeautifulSouppyPdf

Índice deFacetas

pymongo

networkx gensimnumpy scikit-learn

matplotlibObsPy

InfoViz.jsD3.js

os.walkpydeliciousevernote

Hin Hout

Internal Indexes

Pre-processing

Feature Extraction

Transformation Workflow

FCat() FHil()

Visualization

NLP

Author

MLContentDomain

Expert Roles

OntologiesClassifiersInformation

Extraction

Algorithms

Similarityforg

Vizualization Strategies

54

Iterators

Data Container UX

Task !

55

forg:• navigation (crawler/iterador)

• feature extraction

• FHil(vagg,vagg): hierarchical structuring

• FCat(vagg,vcnt): categorization

Hin: URL

Hout:URL

<?xml version="1.0"?> <rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:dbd="http://www.lis.ic.unicamp.br/purl/DBD"> <rdf:Description rdf:about="http://www.lis.ic.unicamp.br/purl/DBD/DBD1"><!-- metadata --> <dc:creator>Claudia Bauzer Medeiros</dc:creator> <dc:description>Hypothetical DBD for an RDF DBMS</dc:description> <dc:identifier>DBD1</dc:identifier> <dc:format>application/rdf+xml</dc:format> <dc:type><rdf:Description> <dbd:Type>Feature DBD</dbd:Type></rdf:Description> </dc:type><dc:title>Descriptor of an RDF DBMS</dc:title> <dc:date>2009-12-18</dc:date> <dc:language>EN</dc:language> <!-- dimensions and values --> <dbd:concurrency>Two phase lock</dbd:concurrency> <dbd:versioning>unsupported</dbd:versioning> <dbd:storage>RDF triples</dbd:storage><dbd:DML> <rdf:Bag><rdf:li>RDQL</rdf:li><rdf:li>SPARQL</rdf:li> </rdf:Bag></dbd:DML> </rdf:Description></rdf:RDF>

<?xml version="1.0"?> <rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:dbd="http://www.lis.ic.unicamp.br/purl/DBD"> <rdf:Description rdf:about="http://www.lis.ic.unicamp.br/purl/DBD/DBD1"><!-- metadata --> <dc:creator>Rodrigo Dias Arruda Senra</dc:creator> <dc:description>Desiderata DBD for an hypothetical application</dc:description> <dc:identifier>DBD2</dc:identifier> <dc:format>application/rdf+xml</dc:format> <dc:type><rdf:Description> <dbd:Type>Desiderata DBD</dbd:Type></rdf:Description> </dc:type><dc:title>Desiderata descriptor of an hypothetical application</dc:title> <dc:date>2010-01-05</dc:date> <dc:language>EN</dc:language> <!-- dimensions and values --><dbd:concurrency>Two phase lock</dbd:concurrency> <dbd:storage>RDF triple store</dbd:storage> <dbd:DML>RDQL</dbd:DML></rdf:Description> </rdf:RDF>

58

NDVI Profiles

Data Management

Manipulation

Create Retrieve Update Delete Index

Storage

Information Management

Transformations‣Browsing‣Iterating‣Searching‣ Augmenting‣Mining ‣Description‣ Annotation‣ Schematization ‣Summarizing

‣Structuring‣Sorting‣Merging‣ Decreasing‣ Filtering‣ Fusing

Example

61

Example

62

Input Collection

Task: info extraction

Task: transformation

Task: visualization

63

WebMAPS: DataFlow

Correio

FTP

MODIS Reprojection Tool

Imagens

Recorteda região

Geometria(IBGE)‏

64

NDVI

Related Work

9

• embedded • n-tier client/server (including web services)• mediators

Approaches to App-to-DMS binding

Information Integration [1]

Process• Understanding• Standardization• Specification• Execution

[1] Beauty and the Beast: The Theory and Practice ofInformation IntegrationLaura Haas

Mechanism • Materialization• Federation• Indexing

Related Work

9

• embedded • n-tier client/server (including web services)• mediators

Descriptors are orthogonal to all of these!

Approaches to App-to-DMS binding

Information Integration [1]

Process• Understanding• Standardization• Specification• Execution

[1] Beauty and the Beast: The Theory and Practice ofInformation IntegrationLaura Haas

Mechanism • Materialization• Federation• Indexing

66

Extração dos Dados Sensoriasdataset = gdal.Open(raster_file, GA_ReadOnly )‏# Obtenção dos coeficientes para funções afins de mapeamento de coordenadasgt = dataset.GetGeoTransform()‏

# Obtenção da banda de dados de interesseband = dataset.GetRasterBand(1)‏

# Identificação do padrão de codificação dos dados.# No caso do arquivo TIF os dados são bytes sem sinal ('Byte')‏data_type = gdal.GetDataTypeName(band.DataType)

# Obtenção das dimensões da imagemwidth, height = band.XSize, band.YSize

# Conversão do MBR do sistema de coordenadas lat/long para linha/coluna# Xgeo = GT(0) + Xpixel*GT(1) + Yline*GT(2)‏# Ygeo = GT(3) + Xpixel*GT(4) + Yline*GT(5)

ul_pixel, lr_pixel = g2p(gt,*ul_geo), g2p(gt,*lr_geo)‏

67

WebMAPS

Case Study: WebMaps

Case Study: WebMaps

69

Extração dos Dados

def raster2array(ul_pixel, lr_pixel, dtype='B'): """Using ul_pixel and lr_pixel it generates a numpy array with the extracted interest region from the raster file """ col_size = lr_pixel[1]-ul_pixel[1]+1 row_size = lr_pixel[0]-ul_pixel[0]+1 scanline = band.ReadRaster(ul_pixel[1], ul_pixel[0], col_size, row_size)‏ num_pixels = col_size*row_size roi = numpy.array(struct.unpack(dtype*num_pixels, scanline))‏ roi.shape = (row_size, col_size)‏ return roi

# Read data from raster file into a numpy array# defining a region of interest matrixroi = raster2array(ul_pixel, lr_pixel)‏

70

Extração da Geometria

shp = ogr.Open(filepath) ‏

# Layer correspondente ao Estado de São paulolayer = vf.shp.GetLayerByName('35mu500gc')

# Feature correspondente ao município de Campinasfeature = layer.GetFeature(501)

# Extração dos pontos de controle do perímetrogeometry = feature.GetGeometryRef() ‏poly = geometry.GetGeometryRef(0) ‏centroid = geometry.Centroid() ‏centroid_geo = centroid.GetX(), centroid.GetY()‏

# Definição do Retângulo Envoltório Mínimo (MBR)‏lg_left, lg_right, lt_bot, lt_up = poly.GetEnvelope()‏ul_geo, lr_geo = (lg_left, lt_up), (lg_right, lt_bot)‏

71

Operações Espaciais

Organicer

72

Organicer

72

Organicer

72

Organicer

72

Organicer

72