Open Research Newcastle
Browse

Optimizing large language model utilization through scheduling strategies

thesis
posted on 2025-06-25, 23:46 authored by Yueyue Liu

Large Language Models (LLMs) have garnered significant attention within Machine-Learning-as-a-Service (MLaaS) offerings due to their remarkable capabilities. As the variety of available LLMs grows, users face the challenge of selecting LLMs that best balance their needs for cost and performance.

This thesis investigates the cost-effective allocation of jobs to LLMs, aiming to simultaneously enhance the percentage of correctly processed jobs and reduce costs. The study begins with an empirical exploration of the potential for applying scheduling optimization to improve performance and cost in LLMs utilization. Since the correctness cannot be determined until the output from the LLM is received, we employ a method combining prediction and optimization. Based on the predicted accuracy and cost, search-based algorithms select the most suitable LLM for each job. The results prove that while scheduling demonstrates significant potential to enhance performance and cost efficiency, improvements are needed in both prediction accuracy and search algorithms.

To address these challenges, we propose OptLLM. OptLLM operates in two modes: either optimizing single objective between LLM accuracy and cost, or generating a set of non-dominated solutions that strike a good balance. It predicts the performance of candidate LLMs for each job using a multi-label classification model with uncertainty estimation and iteratively refines the allocation schedule through destruction and reconstruction. Although OptLLM can provide an efficient schedule solution, collecting training data for the prediction module is costly, particularly when dealing with diverse task types and multiple available LLMs. For instance, creating training data often requires submitting the same job to all candidate LLMs, resulting in substantial computational and financial costs. Therefore, we propose CPLS to adapt training data from one task to another task by transfer learning, improving the practicality of the prediction model in real-world scenarios. Despite the benefits, both OptLLM and CPLS are statics frameworks with predictive scheduling, which may not fully adapt to dynamic real-world conditions. In addition, they only consider invocation costs while overlooking uncertain generation costs. To address these limitations, we further propose SLM, a dynamic optimization framework. SLM incorporates an Adaptive Cache Manager, a Performance-Cost Optimized Scheduler, and a Dynamic Update Manager to achieve dynamic optimization through periodic prediction and optimization. By leveraging real-world feedback, SLM updates the cache and retrains the prediction model, ensuring continuous improvement.

In summary, this thesis presents comprehensive approaches for LLM allocation to enhance performance and reduce costs. Through extensive experiments on various LLM-based tasks, we validate the effectiveness of the proposed methods, demonstrating their potential to address both static and dynamic optimization challenges in LLMs' utilization.

History

Year awarded

2025

Thesis category

  • Doctoral Degree

Degree

Doctor of Philosophy (PhD)

Supervisors

Hongyu Zhang, University of Newcastle Sky Miao, University of Newcastle

Language

  • en, English

College/Research Centre

College of Engineering, Science & Environment

School

School of Information and Physical Science

Open access

  • Open Access

Rights statement

Copyright 2025 Yueyue Liu

Usage metrics

    Theses

    Exports

    RefWorks
    BibTeX
    Ref. manager
    Endnote
    DataCite
    NLM
    DC