[IEEE 2010 Second Brazilian Workshop on Social Simulation (BWSS) - Sao Paulo, Brazil...
Transcript of [IEEE 2010 Second Brazilian Workshop on Social Simulation (BWSS) - Sao Paulo, Brazil...
Simulating a Multi-Agent Electricity Market
Paulo Trigo
LabMAg; DEETC, ISELInstituto Superior de Engenharia de Lisboa
Portugale-mail: [email protected]
Helder Coelho
LabMAg; DI, FCULFaculdade de Ciências da Universidade de Lisboa
Portugale-mail: [email protected]
Abstract—This paper proposes a multi-agent based simula-tion (MABS) framework to construct an artificial electric powermarket populated with learning agents. The proposed frame-work facilitates the integration of two MABS constructs: i) thedesign of the environmental physical market properties, andii) the simulation models of the decision-making and reactiveagents. The framework is materialized in an experimental setupinvolving distinct power generator companies which operatein the market and search for the trading strategies that bestexploit their generating units’ resources. The experimentalresults show a coherent market behavior that emerges fromthe overall simulated environment.
I. INTRODUCTION
The start-up of nation-wide electric markets, along withits recent expansion to intercountry markets, aims at pro-viding competitive electricity service to consumers. Thenew market-based power industry calls for human decision--making in order to settle the energy assets’ trading strate-gies. The interactions and influences among the market par-ticipants are usually described by game theoretic approacheswhich are based on the determination of equilibrium pointsto which compare the actual market performance [?], [?].However, those approaches find it difficult to incorporatethe ability of market participants to repeatedly probe marketsand adapt their strategies. Usually, the problem of finding theequilibria strategies is relaxed (simplified) both in terms of:i) the human agents’ bidding policies, and ii) the technicaland economical operation of the power system.
As an alternative to the equilibrium approaches, themulti-agent based simulation (MABS) comes forth as beingparticulary well fitted to analyze dynamic and adaptivesystems with complex interactions among constituents [?],[?]. Tools are designed to simulate the interactions ofagents (individuals) and to study the macro-scale effectsof those interactions. Agents exhibit bounded rationality,i.e., they make decisions based on local information (partialknowledge) of the system and of other agents. Agents mayalso learn and adapt their strategies during a simulation,thus possibly converging toward equilibrium. However, the
MABS purpose is not to explicitly search for equilibriumpoints, but rather to reveal and assist to understand thecomplex and aggregate system behaviors that emerge fromthe interactions of heterogeneous individuals. Despite itsbroad applicability, most frameworks describe MABS eitherfrom a specific architecture perspective [?] or describeagents from an abstract specification level [?]. Frameworksare either too specific or too generic and there is no modelingformalism to fully describe the MABS constituents alongwith their inter-relations.
In this paper we propose a MABS modeling frameworkthat provides constructs for the (human) designer to specifya dynamic environment, its resources, observable propertiesand its inhabitant decision-making agents. The decision--making model is utility-based and the agents search forpolicies that maximize their long-term rewards. We used theproposed framework to capture the behavior of the electricitymarket and to design its simulation model. The modelincorporates the operation of several generator company(GenCo) operators, each with distinct power generating units(GenUnit), and a market operator (Pool) which computesthe hourly market price (driven by the electricity demand).The market is simulated over time and each decision-makingagent (GenCo) follows a reinforcement learning method [?],[?] to find its own trading strategies.
Section II proposes the MABS framework and Sect. IIIproposes the agent design and its decision-making process;the Sect. IV and Sect. V instantiate and evaluate the overallproposal and the Sect. VI shows conclusions and futuregoals.
II. THE PROPOSED MABS FRAMEWORK
We describe the structural MABS constituents by meansof two concepts: i) the environmental entity, which owns adistinct existence in the real environment, e.g. a resourceor the physical body of an agent, and ii) the environ-mental property, which is a measurable aspect of the realenvironment, e.g. the price of a bid or the demand forelectricity. Hence, we define the environmental entity set,
2010 Second Brazilian Workshop on Social Simulation
978-0-7695-4471-7/10 $26.00 © 2010 IEEE
DOI 10.1109/BWSS.2010.14
90
ET = { e1, . . . , en }, and the environmental property set,EY = { p1, . . . , pn }. The whole environment is the union ofits entities and properties: E = ET ∪ EY .
The environmental entities, ET , are often clustered in dif-ferent classes, or types, thus partitioning ET into a set, PET ,of disjoints subsets, Pi
ET, each containing entities that be-
long to the same class. Formally, PET ={P1ET
, . . . ,PkET
}defines a full partition of ET , such that Pi
ET⊆ ET and
PET = ∪i=1...k PiET
and PiET
∩PjET
= ∅ ∀i ̸= j. The parti-tioning may be used to distinguish between decision-makingagents and available resources, e.g. a company that decidesthe biding strategy to pursue or a plant that provides thedemanded power.
The environmental properties, EY , can also be clustered, ina similar way as for the environmental entities, thus groupingproperties that are related. The partitioning may be used toexpress distinct categories, e.g. economical, electrical, eco-logical or social aspects. Another, more technical usage, is toseparate constant parameters from dynamic state variables.
The factored state space representation.: The state ofthe simulated environment is implicitly defined by the stateof all its environmental entities and properties. We followa factored representation, that describes the state space asa set, V , of discrete state variables [?]. Each state variable,vi ∈ V , takes on values in its domain D( vi ) and the global(i.e., over E) state space, S ⊆ ×vi∈VD( vi ), is a subset ofthe Cartesian product of the sate variable domains. A states ∈ S is an assignment of values to the set of state variablesV . We define fC , C ⊆ V , as a projection such that if s isan assignment to V , fC( s ) is the assignment of s to C; wedefine a context c as an assignment to the subset C ⊆ V;the initial state variables of each entity and property aredefined, respectively, by the functions initET : ET → C andinitEY : EY → C.
From environmental entities to resources and agents.:The embodiment is central in describing the relation betweenthe entities and the environment [?]. Each environmentalentity can be seen as a body, possibly with the capability toinfluence the environmental properties. Based on this idea ofembodiment, two higher-level concepts (decoupled from theenvironment, E , characterization) are introduced: i) agent,owing reasoning and decision-making capabilities, and ii)resource, without any reasoning capability. Thus, given a setof agents, Υ, we define an association function embody :Υ → ET , which connects an agent to its physical entity. Ina similar way, given a set of resources, Φ, we define themapping function identity : Φ → EY . We consider that|E| = |Υ| + |Φ|, thus each entity is either mapped to anagent or to a resource; there is no third category.
III. THE PROPOSED AGENTS’ DESIGN
The difference between a resource and an agent is that aresource is nothing more than its physical counterpart (i.e.,the identity function is essentially some form of equality),
whereas the agent adds, to its physical counterpart, thecapability to act (i.e., the embody function augments anentity by enabling the percept, act, communicate and reasonprocesses). The construction of multiple agents with diversecharacteristics (e.g. aggressive or cautious bidders) calls fora single and systematic way of specifying the whole setagents. The following structure, adopted from [?], framesthe design of each agent j ∈ Υ:
j ≡ ⟨G,M,mem(wself, wother,Mmem ),Lcap,Hcap,Ξ ⟩ (1)
where, G is the set of goals the agent aims to achieve (e.g.maximize profit); M is the theoretic model implemented bythe agent, e.g. reactive, deliberative, hybrid, with(out) socialcapability; mem(wself, wother,Mmem ) is the working memorywith self, wself, and other agents’, wother, representation andwith the structures that support the agent model, Mmem; Lcap
is the set of primary actions or protocols available to theagent, e.g. increase or decrease bid prices; Hcap is a set ofhigher level actions or protocols available to the agent, e.g.decision-making policies; Ξ is the set of channels for inter--agent communication and for agent-environment, perceptand act, interaction.
The agent has three main sources of G goals: i) theagent designer, that usually predefines goals, ii) the otheragents, with whom an hierarchical relationship may exist,and iii) the human user, when such a direct interactionexists. The agent behavior implementation is superimposedby the theoretical model, M, guidelines, which range from apure reactive and socially deaf to a deliberative and sociallyfulfiller agent. The working memory contains the agent’sown environment representation (e.g. current state, historicevolution, aggregated knowledge) and the agent’s view aboutthe other agents’ perspective on the environment; the model,M, support is also contained in mem. The primary capabilityset, Lcap, contains the actions that are directly executable inthe environment, usually taking one time cycle to terminate;the Lcap also contains the inter-agent communicative acts.The higher level, Hcap, capabilities are designed over Lcap
and therefore its execution usually extends over severaltime cycles (hence, the Hcap strategic trend). The set ofcommunication channels separates in two disjoint subsets: i)Ξag , for the inter-agent communication, and ii) Ξenv , for theagent-environment, percept and act, interaction; such that,Ξ = Ξag ∪ Ξenv and Ξag ∩ Ξenv = ∅. The Ξag = ∅ indi-cates absence of inter-agent communication; the Ξenv = ∅indicates absence of agent-environment interaction.
The agent-environment and inter-agent interface.: Anagent, as a situated entity, is equipped with two functions:percept and act. A third function is required for socializingwith others: communicate (e.g. to negotiate or teamwork).The agent j perception depends on the previous functions,embody ( j ) = e and initET ( e ) = C, where e ∈ ET .From the agent’s perspective, the environment is partiallyobservable when C ⊂ V and totally observable when C = V .
91
The perceptj maps the projected state, c = fC( s ), wheres ∈ S, into the agent mem internal representation. Eachagent j is assigned a set, Aj ⊆ (Lcap∪Hcap), of predefinedactions. The actj submits the execution of action a ∈ Aj ,which affects the environment state; we similarly define thecommunicatej function.
The agent model.: From percept to act there aredecisions to make and the agent’s model, M, cf. Eq. 1,is the place to implement the reasoning capability. Thereare two main approaches: i) the qualitative mental-statebased reasoning, such as the belief-desire-intention (BDI)architecture [?], which is founded on logic theories, andii) the quantitative, decision-theoretic, evaluation of causaleffects, such as the Markov decision process (MDP) [?]support for sequential decision-making in stochastic envi-ronments. There are also hybrid approaches that combinethe qualitative and quantitative formulations [?], [?], [?].
The qualitative mental-state approaches capture the rela-tion between high level components (e.g. beliefs, desires,intentions) and tend to follow heuristic (or rule-based)decision-making strategies, thus being better fitted to tacklelarge-scale problems and worst fitted to deal with stochasticenvironments.
The quantitative decision-theoretic approaches deal withlow level components (e.g., primitive actions and immediaterewards) and searches for long-term policies that maximizesome utility function, thus being worst fitted to tacklelarge-scale problems and better fitted to deal with stochasticenvironments.
The electric power market is a stochastic environment and,in this paper, we formulated medium-scale problems that canfit a decision-theoretic agent model. Therefore, in this paperwe propose a MDP based agent model.
The decision-making process.: The Markov decisionprocess (MDP) is a model of a stochastic sequential de-cision problem. In the MDP, a decision occurs at eachdecision-epoch and a period is the time interval betweenconsecutive decision-epochs. The MDP formulation assumesa fixed period length and each decision-epoch always refersto the beginning of a period. Formally, a finite factoredMDP is a 5-tuple M ≡ ⟨S,A,Ψ, P,R ⟩, where S is afinite set of states with a factored representation (cf. Sect.II), A is a finite set of actions, Ψ ⊆ S × A is the setof admissible state-action pairs, R( s, a ) is the expectedreward when action a is executed at s, and P ( s ′ | s, a ) isthe probability of being at state s ′ after executing a at states. At a decision-epoch, the decision to execute action a ∈ Ain state s ∈ S , changes the system to state s ′ ∈ S withprobability P ( s ′ | s, a ) and provides the decision makerwith an expected reward R( s, a ).
The reinforcement learning method.: The solution ofa MDP problem is a policy, π : S → A. The aim is tofind an optimal policy, π⋆, that maximizes the expected totalreward, or to find a near-optimal policy that comes within
some bound of optimal. When the MDP is fully known (i.e.,P is ‘a priori’ defined) methods like value iteration, policyiteration or linear programming can be used to find π⋆.When the MDP is partially known (i.e., P is undefined) thetemporal-difference (TD) method is a reinforcement learning(RL) approach that resorts to experience (sampled sequencesof states, actions and rewards from simulated interaction) tosearch for optimal policies [?]. A well-known TD method isQ-learning [?] (others are TD(λ ) and SARSA [?]), whichrequires (for correct convergence) the exploration of thestate-action space.
IV. EXPERIMENTAL SETUP
We followed the MABS modeling proposal (cf. Sect. IIand Sect. III) to describe the electric market simulationmodel. Our experiments have two main purposes: i) illustratethe proposal applicability, and ii) analyze the resultingbehavior, e.g. the learnt bidding policies, in light of themarket specific dynamics.
The simulated environment. The experimental assump-tion is that the energy can only be traded through a spotmarket (no bilateral agreements), which is operated via aPool institutional power entity. Each generator company,GenCo, submits (to Pool) how much energy, each of itsgenerating unit, GenUnitGenCo, is willing to produce andat what price. Thus, we have: i) the power supply systemcomprises a set, EGenCo, of generator companies, ii) eachgenerator company, GenCo, contains its own set, EGenUnitGenCo ,of generating units, iii) each generating unit, GenUnitGenCo,of a GenCo, has constant marginal costs, and iv) the marketoperator, Pool, trades all the GenCos’ submitted energy.
The bidding procedure conforms to the so-called “blockbids” approach [?], where a block represents a quantity ofenergy being bided for a certain price; also, GenCos arenot allowed to bid higher than a predefined price ceiling.Thus, the market supply essential measurable aspects are theenergy price, quantity and production cost. The consumerside of the market is mainly described by the quantity ofdemanded energy; we assume that there is no price elasticityof demand (i.e., no demand-side market bidding). Therefore,we have: ET = {Pool } ∪ EGenCo ∪g∈EGenCo EGenUnitg andEY = { quantity, price, productionCost }
The quantity refers both to the supply and demand sidesof the market; the price referes both to the supply bidedvalues and to the market settled (by Pool) value. The EGenCo
contains the decision-making agents. The Pool is a reactiveagent that always applies the same predefined auction rulesin order to determine the market price and hence the blockbids that clear the market. Each EGenUnitGenCo represents theGenCo’s set of available resources.
The resources’ specification. Each generating unit,GenUnitGenCo, defines its marginal costs and constructs theblock bids according to the strategy indicated by its gene-rator company, GenCo. We considered (lined up with [?])
92
three types of generating units: i) one base load coal plant,CO, ii) one combined cycle plant, CC, to cover intermediateload, and iii) one gas turbine, GT, peaking unit. Table Ishows the essential properties of each plant type and TableII shows the heat rate curves used to define the biddingblocks. The marginal cost bidding block’s quantity is thecapacity increment, e.g. for CO, the 11.9 marginal costbidding block’s quantity is 350− 250 = 100 MW (cf. TableII, CO, top lines 2 and 1).
Table IPROPERTIES OF GENERATING UNITS; THE UNITS’ TYPES ARE COAL(CO), COMBINED CYCLE (CC) AND GAS TURBINE (GT); THE O&M
INDICATES “OPERATION AND MAINTENANCE” COST.
Type of generating unitProperty unit CO CC GT
Fuel — Coal (BIT) Nat. Gas Nat. Gas
Capacity MW 500 250 125
Fuel price C/MMBtu 1.5 5 5
Variable O&M C/MWh 1.75 2.8 8
Table IIEACH GENERATING UNIT’S CAPACITY BLOCK (MW) AND HEAT RATE(BTU/KWH) AND THE CORRESPONDING MARGINAL COST (C/MWH) .
CO generating unitCap. Heat rate Marg. cost
250 12000 —350 10500 11.9400 10080 12.5450 9770 12.7500 9550 13.1
CC generating unitCap. Heat rate Marg. cost
100 9000 —150 7800 29.8200 7200 29.8225 7010 30.3250 6880 31.4
GT generating unitCap. Heat rate Marg. cost
50 14000 —100 10600 44.0110 10330 46.2120 10150 48.9125 10100 52.5
The decision-making agents’ structure. Each genera-tor company defines the bidding strategy for each of itsgenerating units. We considered six basic strategies, sttgiwhere i ∈ { 1, . . . , 6 }, available for a GenCo to apply:i) sttg1, bid according to the marginal production cost of
each GenUnitGenCo (follow heat rate curves, cf. Table II),ii) sttg2, make a “small” increment in the prices of allthe previous-day’s block bids, iii) sttg3, similar to sttg2,but makes a “large” increment, iv) sttg4, make a “small”decrement in the prices of all the previous-day’s block bids,v) sttg5, similar to sttg4, but makes a “large” decrement,vi) sttg6, hold the prices of all previous-day’s block bids.
The above strategies correspond to the GenCo agent’sprimary actions. Additionally the GenCo communicates withthe Pool via an inform communicative act. Therefore, wehave: Lcap = { sttg1, . . . , sttg6 } ∪ { inform }.
The GenCo has a set, EGenUnitGenCo , of generating unitsand, at each decision-epoch, it decides the strategy toapply to each generating unit, thus choosing a vector ofstrategies,
−−→sttg, where the ith vector’s component refers
to the GenUnit iGenCo generating unit, hence: Hcap =
×|EGenUnitGenCo |i=1 { sttg1, . . . , sttg6 }i.The GenCo’s perceived market share, mShare, is used to
characterize the agent internal memory state, thus we havewself = {mShare } and wother = ∅.
The agents’ decision process. Each GenCo is a MDPdecision-making agent such that S = wself, A = Hcap, theP function is unknown (represents the market dynamics) andthe R function is computed, internally by the GenCo, as thedaily profit. The decision process period represents a dailymarket. At each decision-epoch the Pool agent receives allthe GenCos’s block bids for the 24 daily hours and settles thehourly market price by matching offers in a classic supplyand demand equilibrium price (we assume a hourly constantdemand).
V. EXPERIMENTS AND RESULTS
We designed three experiments and Table III showsthe GenCo’s name and its production capacity, computedaccording to the respective GenUnits (cf. Table I). The“active” suffix (cf. Table III, name column) means that theGenCo searches for its GenUnits best bidding strategies; i.e.“active” is a policy learning agent.
Table IIITHE EXPERIMENTS’ GenCoS AND GenUnitS.
GenCoExp. name Prod. Capac. GenUnits
#1 GenCo_active 875 CO & CC & GT
#2GenCo_major 2000 2×CO & 4×CC
GenCo_minor&active 875 3×CC & 1×GT
#3GenCo_major&active 2000 2×CO & 4×CCGenCo_minor&active 875 3×CC & 1×GT
Experiment #1. The experiment sets a constant, 600MW, hourly demand for electricity. Figure 1 shows theGenCo_active process of learning the bidding policy that
93
gives the highest long-term profit. We used Q-learning, withan ϵ-greedy exploration strategy, which picks a randomaction with probability ϵ and behaves greedily otherwise(i.e., picks the action with the highest estimated actionvalue); we defined ϵ = 0.2. The learning factor rate ofQ-learning was defined as α = 0.01 and the discount factor(which measures the present value of future rewards) was setto γ = 0.5. Figure 2 shows the bid blocks that cleared themarket (at the first hour of last simulated day). As there is nomarket competition the cheapest, CO, bids zero, the GT setsthe market price (to its ceiling) and the most expensive 200MW are distributed among the most expensive GenUnits(CC, GT). Therefore, the GenCo_active agent found, foreach perceived market share, mShare, the best strategy,
−−→sttg,
to bid its GenUnits’ energy blocks.
Profit of GenCo _active
-0.5
0
0.5
1
1.5
2
2.5
0 200 400 600 800 1000 1200 1400 1600 1800 2000 2200 2400
Simulation Cycle (1 Day; 24 Hours)
Pro
fit ( M
€ )
Figure 1. The process of learning a bid policy to maximize profit. [Exp.#1]
GenCo _active Coupled Block Bids (Day=2500; Hour=1)
0
30
60
90
120
150
180
0 50 100 150 200 250 300 350 400 450 500 550 600Capacity (MW)
Price (€/M
Wh)
Base Coal (CO) Comb. Cycle (CC) Gas Turbine (GT)
Figure 2. The bid policy that maximizes profit (price ceiling is 180). [Exp.#1]
Experiment #2. The experiment sets a constant, 2000MW, hourly demand for electricity. Figure 3 shows themarket share evolution while GenCo_minor&active learnsto play in the market with GenCo_major, which is a largercompany with a fixed strategy: “bid each block 5C higherthan its marginal cost”. We see that GenCo_minor&activegets around 18% (75−57) of market from GenCo_major. To
earn that market the GenCo_minor&active learnt to lowerits prices in order to exploit the “5C space” offered byGenCo_major fixed strategy.
GenCos' Market Share
0
10
20
30
40
50
60
70
80
90
100
0 10 20 30 40 50 60 70 80 90 100
Simulation Cycle (1 Day; 24 Hours)
Mar
ket
Shar
e ( %
)
GenCo _major
GenCo _minor&active
Figure 3. Market share evolution induced by GenCo_minor&active. [Exp.#2]
Experiment #3. In this experiment both GenCos are“active”; the remaining is the same as in experiment #2.Figure 4 shows the market share oscillation while eachcompany reacts to the other’s strategy to win the market.Despite the competition each company learns to secure itsown fringe of the market.
GenCos' Market Share
0
10
20
30
40
50
60
70
80
90
100
0 500 1000 1500 2000 2500 3000 3500 4000 4500 5000
Simulation Cycle (1 Day; 24 Hours)
Mar
ket
Shar
e ( %
)
GenCo_major&active
GenCo_minor&active
Figure 4. Market share evolution induced by both GenCos. [Exp. #3]
VI. CONCLUSIONS AND FUTURE WORK
This paper describes our preliminary work in the cons-truction of a MABS framework to analyze the macro-scaledynamics of the electric power market. Although both re-search fields (MABS and market simulation) achieve con-siderable progress there is a lack of cross-cutting approaches.Hence, our contribution is two folded: i) a comprehensiveformulation of MABS, including the simulated environmentand the inhabiting decision-making and learning agents, andii) a simulation model of the electric power market framedin the proposed formulation. Our initial results reveal anemerging and coherent market behavior, thus inciting us tofurther extend the experimental setup with additional bidding
94
strategies and to incorporate specific market rules, such ascongestion management and pricing regulation mechanisms.
REFERENCES
[1] C. Berry, B. Hobbs, W. Meroney, R. O’Neill, and W. S.Jr, “Understanding how market power can arise in networkcompetition: a game theoretic approach,” Utilities Policy,vol. 8, no. 3, pp. 139–158, September 1999.
[2] S. Gabriel, J. Zhuang, and S. Kiet, “A Nash-Cournot modelfor the north american natural gas market,” in Proceedingsof the 6th IAEE European Conference: Modelling in EnergyEconomics and Policy, 2–3 September 2004.
[3] S. Schuster and N. Gilbert, “Simulating online businessmodels,” in Proceedings of the 5th Workshop on Agent-BasedSimulation (ABS-04), May 3–5 2004, pp. 55–61.
[4] A. Helleboogh, G. Vizzari, A. Uhrmacher, and F. Michel,“Modeling dynamic environments in multi-agent simulation,”JAAMAS, vol. 14, no. 1, pp. 87–116, 2007.
[5] D. Morley and K. Myers, “The SPARK agent framework,”in Proceedings of the AAMAS-04. New York, USA: IEEEComputer Society, 2004, pp. 714–721.
[6] M. Wooldridge, Reasoning About Rational Agents. The MITPress, 2000.
[7] M. Littman, “Markov games as a framework for multi-agent reinforcement learning,” in Proceedings of the 11thInternational Conf. on ML, 1994, pp. 157–163.
[8] T. Krause, G. Andersson, D. Ernst, E. V. Beck, R. Cherkaoui,and A. Germond, “Nash equilibria and reinforcement learningfor active decision maker modelling in power markets,” inProceedings of the 6th IAEE European Conference: Mod-elling in Energy Economics and Policy. Springer, 2–3September 2004.
[9] C. Boutilier, R. Dearden, and M. Goldszmidt, “Exploitingstructure in policy construction,” in Proceedings of the IJCAI-95, 1995, pp. 1104–1111.
[10] A. Clark, Being there: putting brain, body, and world togetheragain. MIT, 1998.
[11] P. Trigo and H. Coelho, “The 5Rings team description pa-per,” in Proceedings of the RoboCup-2004 Symposium, TeamDescription Papers, July 4–5 2004.
[12] A. Rao and M. Georgeff, “BDI agents: From theory topractice,” in Proceedings of the First International Conferenceon Multiagent Systems, S, 1995, pp. 312–319.
[13] M. Puterman, Markov Decision Processes. Discrete Stochas-tic Dynamic Programming, ser. Probability and Statistics.Wiley, 1994, revised second printing, 2005.
[14] G. Simari and S. Parsons, “On the relationship between MDPsand the BDI architecture,” in Proceedings of the AAMAS-06,May 8–12 2006, pp. 1041–1048.
[15] P. Trigo and H. Coelho, “A hybrid approach to teamwork,”in Proceedings of the VI Encontro Nacional de IntelignciaArtificial (ENIA-07), R.J., Brazil, 2007.
[16] G. P. Dimuro, A. R. Costa, and L. V. Goncalves, “Recognizingand learning observable social exchange strategies in open so-cieties,” in Advances on Social Simulation, Post-Proceedingsof the Brazilian Workshop in Social Simulation, BWSS 2010,G. P. Dimuro, A. C. da Rocha Costa, J. Sichman, D. Adamatti,P. Tedesco, J. Balsa, and L. Antunes, Eds. Los Alamitos:IEEE, 2011, (in this book).
[17] R. Sutton and A. Barto, Reinforcement Learning: An Intro-duction. MIT P., 1998.
[18] C. Watkins and P. Dayan, “Q-learning,” Mach. Learning,vol. 8, pp. 279–292, 1992.
[19] “OMIP - The Iberian Electricity Market Operator,” online:‘http://www.omip.pt’.
[20] A. Botterud, P. Thimmapuram, and M. Yamakado, “Simulat-ing GenCo bidding strategies in electricity markets with anagent-based model,” in Proceedings of the 7th Annual IAEEEuropean Energy Conference (IAEE-05), August 28–30 2005.
95