Max-entropy and risk-aware Inverse RL: B.D. Ziebart, A.L. Maas, J.A.…: am

(no subject)

May 13, 2018 05:01

Max-entropy
and risk-aware
Inverse RL:

B.D. Ziebart, A.L. Maas, J.A. Bagnell, A.K. Dey,
“Maximum entropy inverse reinforcement learning”
in Proc. AAAI Conf. on A.I., 2008, 1433-1438.

A. Boularias, J. Kober, J.R. Peters (2011)
“Relative entropy inverse reinforcement learning”
in Proc. Int. Conf. A.I. Stat., 2011, 182-189.

N. Aghasadeghi, T. Bretl (2011) “Maximum entropy
inverse reinforcement learning in continuous state
spaces with path integrals” in Proc. IEEE/RSJ Int.
Conf. Intell. Robots Syst., 2011, pp. 1561-1566.

T. Park, S. Levine (2013) “Inverse optimal control
for humanoid locomotion” in Robot. Sci. Syst. WS
Inverse Opt. Contr. Robot. Learn. Demonstr., 2013.

M. Kalakrishnan, P. Pastor, L. Righetti, S. Schaal,
(2013) “Learning objective functions for manipulation”
in Proc. IEEE Int. Conf. Robot. Autom., 1331-1336.

J. Mainprice, D. Berenson (‎2014) "Learning Cost
Functions for Motion Planning of Human-Robot
Collaborative Manipulation Tasks from Human-
Human Demonstration." 2014 AAAI Fall Symposium.

Previously: 1, 2, 3.

Related:

N. Sugimoto, J. Morimoto (2011) "Phase-dependent
trajectory optimization for CPG-based biped walking
using path integral reinforcement learning," in Proc.
11th Int. Conf. on Humanoid Robots, IEEE-RAS, 255-260.
M.B. Horowitz, A. Damle, J.W. Burdick (2014)
"Linear Hamilton Jacobi Bellman equations in
high dimensions," in Proc. 53rd Ann. IEEE Conf.
on Decision and Control (CDC), 2014, 5880-5887.
M.B. Horowitz, J.W. Burdick (2014) "Optimal
navigation functions for nonlinear stochastic
systems." Intel. Robots and Syst. IROS 2014.

Links from Horowitz,
Damle, Burdick (2014):
Fast methods for linear HJB
(for linear solvable MDPs):

G. Beylkin, M.J. Mohlenkamp (2005) "Algorithms
for Numerical Analysis in High Dimensions."
SIAM J. on Sci. Comp., 26(6):2133-2159.
M.B. Horowitz, J.W. Burdick (2014) "Semidefinite
relaxations for stochastic optimal control policies"
In Am. Controls Conf. (ACC, 2014), 3006-3012.
Y.P. Leong, M.B. Horowitz, J.W. Burdick (2016)
"Linearly Solvable Stochastic Control Lyapunov Functions"
I.M. Mitchell, C.J. Tomlin (2003)
"Overapproximating Reachable Sets
by Hamilton-Jacobi Projections."
J. of Sci. Comp., 19(1-3):323-346.
W.M. McEneaney (2007) "A curse-of-dimensionality-free
numerical method for solution of certain HJB PDEs."
SIAM J. on Control and Optim., 46(4):1239-1276.
J.B. Lasserre (2001) "Global Optimization with
Polynomials and the Problem of Moments."
SIAM J. on Optimization, 11(3):796-817.
J.B. Lasserre, D. Henrion, C. Prieur, E. Trélat
(2008) "Nonlinear Optimal Control via Occupation
Measures and LMI-Relaxations" SIAM J. on Control
and Optim., 47(4):1643-1666. Erratum.
A. Majumdar, A.A. Ahmadi, R. Tedrake (2013)
"Control design along trajectories with sums
of squares programming" In IEEE Int. Conf.
on Robotics and Autom. (ICRA), pp.:4054-4061.
Cf.

Previously: Tensors, Lasserre,
Fienup, Transitive Closure,
DP speedups, & Doubling,
Schur-Nevanlinna-Pick.

Links from Mitchell & Tomlin (2003):
Mitchell, I., Bayen, A., Tomlin, C.J. (2001)
"Validating a Hamilton-Jacobi approximation to
hybrid system reachable sets" in Benedetto, M.D.D.,
et al. (eds.), "Hybrid Systems: Computation and
Control", L.N.C.S.2034, Springer-Verlag, pp.418-432.
Mitchell, I., Tomlin, C. (2002) "Level set methods
for computation in hybrid systems" in Krogh, B., et al.
(eds.), "Hybrid Systems: Computation and Control,"
L.N.C.S. 1790, Springer-Verlag, pp.310-323.
Mitchell, I., Bayen, A., Tomlin, C. J. (2005)
"A Time-Dependent Hamilton-Jacobi Formulation of
Reachable Sets for Continuous Dynamic Games"
IEEE Trans. on Autom. and Contr., 50(7), 974.
Osher, S., Sethian, J.A. (1988) "Fronts propagating
with curvature-dependent speed: Algorithms based on
Hamilton-Jacobi formulations" J.Comput.Phys. 79, 12-49.
Canonical transform.: 1, 2.

stat, pde, gm, bss, slam, dsp, mdp, me, imit, math, lp, ai, dc, sdp, mds, em, ta, optics, dbn, tl, vlsn, cs, regr, ml, rl, bp, ct, pomdp, nlogn, mt, mc, qp, cyb, dp, pca