Manual

This page shows how to use the HSVIforOSPOSGs.

Getting started

First, a game in .osgposg format must be loaded as OSPOSG:

julia> osposg = OSPOSG("../../games/pursuit-evasion/peg03.osposg")OSPOSG:
  discount = 0.95
  state_count = 143
  partition_count = 21
  player1_action_count = 145
  player2_action_count = 13
  observation_count = 2
  transition_count = 2671
  reward_count = 2671
  minimal_reward = 0.0
  maximal_reward = 100.0
  LB_min = 0.0
  UB_max = 1999.9999999999982
  lipschitz_delta = 999.9999999999991
  initial_partition = 5
  initial_belief = [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0]
HSVIforOSPOSGs.OSPOSGMethod
OSPOSG(path::AbstractString)
OSPOSG(io::IO)

Construct OSPOSG from .osposg file at path or from IO io.

source

Then, HSVI algorithm must be instantiated/configured. Here we use HSVI with default parameters:

julia> hsvi = HSVI()HSVI:
  neighborhood = 1.0e-6
  presolve_epsilon = 0.01
  presolve_time_limit = 300.0
  optimizer_factory = #26

For available parameters and their meaning see below:

HSVIforOSPOSGs.HSVIType
HSVI

Heuristic search value iteration solver for OSPOSGs.

The constructor takes following keyword arguments:

  • neighborhood - parameter that guarantees Lipschitz continuity and convergence of the algorithm
  • presolve_epsilon - precision the solver is trying to achieve during presolve phase
  • presolve_time_limit - time limit for the presolve phase in seconds
  • optimizer_factory - function that returns optimizer compatible with JuMP.Model to be used as LP solver
source

The neigborhood parameter $D$ of HSVI must be within these bounds

\[0 < D < \frac{(1 − \gamma) \varepsilon}{2 \delta}\]

for the algorithm to have convergence guarantees, where $\gamma$ is discount factor, $\varepsilon$ is specified gap and $\delta$ is Lipschitz delta of the game.

After loading the game and configuring the algorithm we can start solving by running solve:

julia> recorder = solve(osposg, hsvi, 1.0, 60.0)┌ Info: OSPOSG:
│   discount = 0.95
│   state_count = 143
│   partition_count = 21
│   player1_action_count = 145
│   player2_action_count = 13
│   observation_count = 2
│   transition_count = 2671
│   reward_count = 2671
│   minimal_reward = 0.0
│   maximal_reward = 100.0
│   LB_min = 0.0
│   UB_max = 1999.9999999999982
│   lipschitz_delta = 999.9999999999991
│   initial_partition = 5
└   initial_belief = [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0]
┌ Info: HSVI:
│   neighborhood = 1.0e-6
│   presolve_epsilon = 0.01
│   presolve_time_limit = 300.0
└   optimizer_factory = #26
[ Info:    0.17s presolve_LB  +50.07461
[ Info:   18.97s presolve_UB  +85.92328
[ Info:   exp      time         LB         UB        gap      Γ      Υ  depth
[ Info:     0:   19.63s  +50.07461  +85.92328  +35.84868     21    143
[ Info:     1:   27.08s  +52.16351  +85.91399  +33.75048     26    148      2
[ Info:     2:   28.48s  +61.05100  +85.90366  +24.85265    101    223     37
[ Info:     3:   28.75s  +63.77537  +85.53740  +21.76203    118    240      8
[ Info:     4:   29.58s  +72.75011  +84.93808  +12.18797    165    287     23
[ Info:     5:   29.64s  +80.12347  +84.90432   +4.78085    170    292      2
[ Info:     6:   29.97s  +82.01586  +84.60720   +2.59134    191    313     10
[ Info:     7:   31.85s  +82.34084  +84.45037   +2.10953    282    404     45
[ Info:     8:   32.29s  +82.34084  +84.43406   +2.09323    301    423      9
[ Info:     9:   33.82s  +82.99927  +84.28098   +1.28171    376    498     37
[ Info:    10:   35.37s  +82.99927  +84.05429   +1.05502    445    567     34
[ Info:    11:   35.79s  +83.03234  +83.85491   +0.82257    462    584      8
[ Info:    OK:   35.79s  +83.03234  +83.85491   +0.82257
HSVIRecorder:
  exploration_count = 11
  timestamps = [19.624897003173828, 27.083066940307617, 28.47811508178711, 28.751100063323975, 29.584553956985474, 29.637646913528442, 29.965824127197266, 31.854284048080444, 32.291954040527344, 33.823817014694214, 35.37424302101135, 35.79432392120361]
  lb_values = [50.074605717711876, 52.16350914100638, 61.05100229063332, 63.775374466942615, 72.75010638440921, 80.12346755531159, 82.01585793281183, 82.34083722378338, 82.3408372237834, 82.99927122898136, 82.99927122898137, 83.03234203976398]
  ub_values = [85.92328153759932, 85.91399246071936, 85.90365578045107, 85.53740180330605, 84.93807912421825, 84.90431764514234, 84.60719513661864, 84.45037038477412, 84.4340635913208, 84.28098310426499, 84.05428668032502, 83.85491137263138]
  gaps = [35.84867581988745, 33.75048331971298, 24.852653489817754, 21.762027336363438, 12.187972739809041, 4.78085008983075, 2.591337203806816, 2.109533160990736, 2.0932263675373974, 1.2817118752836336, 1.0550154513436496, 0.8225693328674026]
  gamma_sizes = [21, 26, 101, 118, 165, 170, 191, 282, 301, 376, 445, 462]
  upsilon_sizes = [143, 148, 223, 240, 287, 292, 313, 404, 423, 498, 567, 584]
  exploration_depths = [0, 2, 37, 8, 23, 2, 10, 45, 9, 37, 34, 8]
  clock_start = 1.676718048114944e9
  running   35.91s
HSVIforOSPOSGs.solveFunction
solve(osposg::OSPOSG, hsvi::HSVI, epsilon::Float64, time_limit::Float64)

Run hsvi solver on osposg game with time limit time_limit aiming for gap of at most epsilon in initial belief.

source

The recorder contains records of the algorithm progress after each iteration and can be used for further examination or presentation of results.

LP Solvers

HSVIforOSPOSGs uses GLPK as a JuMP solver for linear programs by default because it is open-source and easily installable through standalone Julia package. However, note that the LP models being solved in HSVI for OSPOSGs are quite complex and GLPK might get stuck on some of them or report them as being unfeasible (although they should be feasible). Therefore, it is recommended to use more powerful solvers (such as CPLEX, Gurobi, etc.). However, installing them is not as straightforward and often requires 3rd party binary and/or proprietary license. If you wish to use different solver you can do so by installing the JuMP wrapper for the given solver (refer to installation instructions at CPLEX.jl, Gurobi.jl, etc.) and then pass the optimizer factory into HSVI constructor:

using CPLEX

HSVI(optimizer_factory=() -> CPLEX.Optimizer())

Logging level

This package utilizes Julia logging facilities to communicate the progress of the algorithm to the user. By default only info, warning and error messages are displayed. To display more detailed debuging info, use the following before running the algorithm:

using Logging

global_logger(ConsoleLogger(stdout, Logging.Debug))

On the contrary, to disable info logging use:

global_logger(ConsoleLogger(stdout, Logging.Warn))

Examples

Additional examples on how to configure and run the algorithm are provided in the scripts directory.