Manual
This page shows how to use the HSVIforOSPOSGs.
Getting started
First, a game in .osgposg
format must be loaded as OSPOSG
:
julia> osposg = OSPOSG("../../games/pursuit-evasion/peg03.osposg")
OSPOSG: discount = 0.95 state_count = 143 partition_count = 21 player1_action_count = 145 player2_action_count = 13 observation_count = 2 transition_count = 2671 reward_count = 2671 minimal_reward = 0.0 maximal_reward = 100.0 LB_min = 0.0 UB_max = 1999.9999999999982 lipschitz_delta = 999.9999999999991 initial_partition = 5 initial_belief = [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0]
HSVIforOSPOSGs.OSPOSG
— TypeOSPOSG
Type for a one-sided partially observable stochastic game. Can be loaded from .osposg
files.
HSVIforOSPOSGs.OSPOSG
— MethodOSPOSG(path::AbstractString)
OSPOSG(io::IO)
Construct OSPOSG
from .osposg
file at path
or from IO io
.
Then, HSVI
algorithm must be instantiated/configured. Here we use HSVI
with default parameters:
julia> hsvi = HSVI()
HSVI: neighborhood = 1.0e-6 presolve_epsilon = 0.01 presolve_time_limit = 300.0 optimizer_factory = #26
For available parameters and their meaning see below:
HSVIforOSPOSGs.HSVI
— TypeHSVI
Heuristic search value iteration solver for OSPOSGs.
The constructor takes following keyword arguments:
neighborhood
- parameter that guarantees Lipschitz continuity and convergence of the algorithmpresolve_epsilon
- precision the solver is trying to achieve during presolve phasepresolve_time_limit
- time limit for the presolve phase in secondsoptimizer_factory
- function that returns optimizer compatible with JuMP.Model to be used as LP solver
The neigborhood
parameter $D$ of HSVI
must be within these bounds
\[0 < D < \frac{(1 − \gamma) \varepsilon}{2 \delta}\]
for the algorithm to have convergence guarantees, where $\gamma$ is discount factor, $\varepsilon$ is specified gap and $\delta$ is Lipschitz delta of the game.
After loading the game and configuring the algorithm we can start solving by running solve
:
julia> recorder = solve(osposg, hsvi, 1.0, 60.0)
┌ Info: OSPOSG: │ discount = 0.95 │ state_count = 143 │ partition_count = 21 │ player1_action_count = 145 │ player2_action_count = 13 │ observation_count = 2 │ transition_count = 2671 │ reward_count = 2671 │ minimal_reward = 0.0 │ maximal_reward = 100.0 │ LB_min = 0.0 │ UB_max = 1999.9999999999982 │ lipschitz_delta = 999.9999999999991 │ initial_partition = 5 └ initial_belief = [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0] ┌ Info: HSVI: │ neighborhood = 1.0e-6 │ presolve_epsilon = 0.01 │ presolve_time_limit = 300.0 └ optimizer_factory = #26 [ Info: 0.17s presolve_LB +50.07461 [ Info: 18.97s presolve_UB +85.92328 [ Info: exp time LB UB gap Γ Υ depth [ Info: 0: 19.63s +50.07461 +85.92328 +35.84868 21 143 [ Info: 1: 27.08s +52.16351 +85.91399 +33.75048 26 148 2 [ Info: 2: 28.48s +61.05100 +85.90366 +24.85265 101 223 37 [ Info: 3: 28.75s +63.77537 +85.53740 +21.76203 118 240 8 [ Info: 4: 29.58s +72.75011 +84.93808 +12.18797 165 287 23 [ Info: 5: 29.64s +80.12347 +84.90432 +4.78085 170 292 2 [ Info: 6: 29.97s +82.01586 +84.60720 +2.59134 191 313 10 [ Info: 7: 31.85s +82.34084 +84.45037 +2.10953 282 404 45 [ Info: 8: 32.29s +82.34084 +84.43406 +2.09323 301 423 9 [ Info: 9: 33.82s +82.99927 +84.28098 +1.28171 376 498 37 [ Info: 10: 35.37s +82.99927 +84.05429 +1.05502 445 567 34 [ Info: 11: 35.79s +83.03234 +83.85491 +0.82257 462 584 8 [ Info: OK: 35.79s +83.03234 +83.85491 +0.82257 HSVIRecorder: exploration_count = 11 timestamps = [19.624897003173828, 27.083066940307617, 28.47811508178711, 28.751100063323975, 29.584553956985474, 29.637646913528442, 29.965824127197266, 31.854284048080444, 32.291954040527344, 33.823817014694214, 35.37424302101135, 35.79432392120361] lb_values = [50.074605717711876, 52.16350914100638, 61.05100229063332, 63.775374466942615, 72.75010638440921, 80.12346755531159, 82.01585793281183, 82.34083722378338, 82.3408372237834, 82.99927122898136, 82.99927122898137, 83.03234203976398] ub_values = [85.92328153759932, 85.91399246071936, 85.90365578045107, 85.53740180330605, 84.93807912421825, 84.90431764514234, 84.60719513661864, 84.45037038477412, 84.4340635913208, 84.28098310426499, 84.05428668032502, 83.85491137263138] gaps = [35.84867581988745, 33.75048331971298, 24.852653489817754, 21.762027336363438, 12.187972739809041, 4.78085008983075, 2.591337203806816, 2.109533160990736, 2.0932263675373974, 1.2817118752836336, 1.0550154513436496, 0.8225693328674026] gamma_sizes = [21, 26, 101, 118, 165, 170, 191, 282, 301, 376, 445, 462] upsilon_sizes = [143, 148, 223, 240, 287, 292, 313, 404, 423, 498, 567, 584] exploration_depths = [0, 2, 37, 8, 23, 2, 10, 45, 9, 37, 34, 8] clock_start = 1.676718048114944e9 running 35.91s
HSVIforOSPOSGs.solve
— Functionsolve(osposg::OSPOSG, hsvi::HSVI, epsilon::Float64, time_limit::Float64)
Run hsvi
solver on osposg
game with time limit time_limit
aiming for gap of at most epsilon
in initial belief.
The recorder
contains records of the algorithm progress after each iteration and can be used for further examination or presentation of results.
HSVIforOSPOSGs.Recorder
— TypeRecorder
Stores the solver statistics after each iteration.
LP Solvers
HSVIforOSPOSGs uses GLPK as a JuMP solver for linear programs by default because it is open-source and easily installable through standalone Julia package. However, note that the LP models being solved in HSVI for OSPOSGs are quite complex and GLPK might get stuck on some of them or report them as being unfeasible (although they should be feasible). Therefore, it is recommended to use more powerful solvers (such as CPLEX, Gurobi, etc.). However, installing them is not as straightforward and often requires 3rd party binary and/or proprietary license. If you wish to use different solver you can do so by installing the JuMP wrapper for the given solver (refer to installation instructions at CPLEX.jl, Gurobi.jl, etc.) and then pass the optimizer factory into HSVI
constructor:
using CPLEX
HSVI(optimizer_factory=() -> CPLEX.Optimizer())
Logging level
This package utilizes Julia logging facilities to communicate the progress of the algorithm to the user. By default only info, warning and error messages are displayed. To display more detailed debuging info, use the following before running the algorithm:
using Logging
global_logger(ConsoleLogger(stdout, Logging.Debug))
On the contrary, to disable info logging use:
global_logger(ConsoleLogger(stdout, Logging.Warn))
Examples
Additional examples on how to configure and run the algorithm are provided in the scripts
directory.