Open Research Newcastle
Browse

Optimal control and policy search in dynamical systems using expectation maximization

thesis
posted on 2025-05-09, 02:34 authored by Prakash Mallick
Trajectory optimization is a fundamental stochastic optimal control problem. In this type of control problem it is incredibly important to consider the impact of measurement noise. In particular, measurement noise plays a huge role in dynamical systems undergoing motion/action, especially in an uncertain environment. Therefore, in this thesis, I deal with a trajectory optimization approach for unknown dynamical systems subject to measurement noise. I propose an architecture which assimilates the benefits of a conventional optimal control procedure with the advantages of maximum likelihood approaches, resulting in a novel iterative trajectory optimization paradigm called Stochastic Optimal Control - Expectation Maximization. I explore the advantages of the proposed methodology in a reinforcement learning setting compared to other widely used baselines. Another class of algorithms known as Guided Policy Search approaches have been proven to work with incredible accuracy for not only controlling a complicated dynamical system, but also learning optimal policies from various unseen instances. One assumes the true nature of the states in almost all of the well-known policy search and learning algorithms. However, I utilize the stochastic optimal control approach and extend it to learning (optimal) policies when there is latency in states. This learning will have less noise because of lower variance in the optimal trajectories. The theoretical and empirical evidence from the learnt optimal policies of the new approach are depicted in comparison to some well-known baselines which are evaluated on a two-dimensional autonomous system with widely used performance metrics. Furthermore, I provide extensive empirical results for the case of a dynamical system attempting to perform three-dimensional complicated tasks as well. The trajectory optimization procedure shows that the optimal policy parameters obtained by the maximum likelihood technique produce better performance in terms of reduction of cumulative cost-to-go and less stochasticity in state and action trajectories through efficiently balancing exploration and exploitation, which is a new direction introduced in this thesis. Additionally, I provide a few novel theoretical results that bridge the gap between definitions of information theory as a result of my proposed optimization objective function.

History

Year awarded

2023

Thesis category

  • Doctoral Degree

Degree

Doctor of Philosophy (PhD)

Supervisors

Chen, Zhiyong (University of Newcastle); Wills, Adrian (University of Newcastle)

Language

  • en, English

College/Research Centre

College of Engineering, Science and Environment

School

School of Engineering

Rights statement

Copyright 2023 Prakash Mallick

Usage metrics

    Theses

    Categories

    No categories selected

    Exports

    RefWorks
    BibTeX
    Ref. manager
    Endnote
    DataCite
    NLM
    DC