Max-entropy
and risk-aware
Inverse RL:
B.D. Ziebart, A.L. Maas, J.A. Bagnell, A.K. Dey,
“
Maximum entropy inverse reinforcement learning”
in Proc. AAAI Conf. on A.I., 2008, 1433-1438.
A. Boularias, J. Kober, J.R. Peters (2011)
“
Relative entropy inverse reinforcement learning”
in Proc. Int. Conf. A.I. Stat., 2011, 182-189.
N. Aghasadeghi, T. Bretl (2011) “
Maximum entropy
inverse reinforcement learning in continuous state
spaces with path integrals” in Proc. IEEE/RSJ Int.
Conf. Intell. Robots Syst., 2011, pp. 1561-1566.
T. Park, S. Levine (2013) “
Inverse optimal control
for humanoid locomotion” in Robot. Sci. Syst. WS
Inverse Opt. Contr. Robot. Learn. Demonstr., 2013.
M. Kalakrishnan, P. Pastor, L. Righetti, S. Schaal,
(2013) “
Learning objective functions for manipulation”
in Proc. IEEE Int. Conf. Robot. Autom., 1331-1336.
J. Mainprice, D. Berenson (2014) "
Learning Cost
Functions for Motion Planning of Human-Robot
Collaborative Manipulation Tasks from Human-
Human Demonstration." 2014 AAAI Fall Symposium.
Previously:
1,
2,
3.
Related:
N. Sugimoto, J. Morimoto (2011) "
Phase-dependent
trajectory optimization for CPG-based biped walking
using path integral reinforcement learning," in Proc.
11th Int. Conf. on Humanoid Robots, IEEE-RAS, 255-260.
M.B. Horowitz, A. Damle, J.W. Burdick (2014)
"
Linear Hamilton Jacobi Bellman equations in
high dimensions," in Proc. 53rd Ann. IEEE Conf.
on Decision and Control (CDC), 2014, 5880-5887.
M.B. Horowitz, J.W. Burdick (2014) "
Optimal
navigation functions for nonlinear stochastic
systems." Intel. Robots and Syst. IROS 2014.
Links from Horowitz,
Damle, Burdick (2014):
Fast methods for linear HJB
(for linear solvable MDPs):
G. Beylkin, M.J. Mohlenkamp (2005) "
Algorithms
for Numerical Analysis in High Dimensions."
SIAM J. on Sci. Comp., 26(6):2133-2159.
M.B. Horowitz, J.W. Burdick (2014) "
Semidefinite
relaxations for stochastic optimal control policies"
In Am. Controls Conf. (ACC, 2014), 3006-3012.
Y.P. Leong, M.B. Horowitz, J.W. Burdick (2016)
"
Linearly Solvable Stochastic Control Lyapunov Functions"
I.M. Mitchell, C.J. Tomlin (2003)
"
Overapproximating Reachable Sets
by Hamilton-Jacobi Projections."
J. of Sci. Comp., 19(1-3):323-346.
W.M. McEneaney (2007) "
A curse-of-dimensionality-free
numerical method for solution of certain HJB PDEs."
SIAM J. on Control and Optim., 46(4):1239-1276.
J.B. Lasserre (2001) "
Global Optimization with
Polynomials and the Problem of Moments."
SIAM J. on Optimization, 11(3):796-817.
J.B. Lasserre, D. Henrion, C. Prieur, E. Trélat
(2008) "
Nonlinear Optimal Control via Occupation
Measures and LMI-Relaxations" SIAM J. on Control
and Optim., 47(4):1643-1666.
Erratum.
A. Majumdar, A.A. Ahmadi, R. Tedrake (2013)
"
Control design along trajectories with sums
of squares programming" In IEEE Int. Conf.
on Robotics and Autom. (ICRA), pp.:4054-4061.
Cf.
Previously:
Tensors,
Lasserre,
Fienup,
Transitive Closure,
DP speedups, &
Doubling,
Schur-Nevanlinna-Pick.
Links from Mitchell & Tomlin (2003):
Mitchell, I., Bayen, A., Tomlin, C.J. (2001)
"
Validating a Hamilton-Jacobi approximation to
hybrid system reachable sets" in Benedetto, M.D.D.,
et al. (eds.), "Hybrid Systems: Computation and
Control", L.N.C.S.2034, Springer-Verlag, pp.418-432.
Mitchell, I., Tomlin, C. (2002) "
Level set methods
for computation in hybrid systems" in Krogh, B., et al.
(eds.), "Hybrid Systems: Computation and Control,"
L.N.C.S. 1790, Springer-Verlag, pp.310-323.
Mitchell, I., Bayen, A., Tomlin, C. J. (2005)
"
A Time-Dependent Hamilton-Jacobi Formulation of
Reachable Sets for Continuous Dynamic Games"
IEEE Trans. on Autom. and Contr., 50(7), 974.
Osher, S., Sethian, J.A. (1988) "
Fronts propagating
with curvature-dependent speed: Algorithms based on
Hamilton-Jacobi formulations" J.Comput.Phys. 79, 12-49.
Canonical transform.:
1,
2.