Open Research Newcastle
Browse

Mining numerical invariants for improving software reliability

thesis
posted on 2025-05-09, 03:48 authored by Bo Zhang
Program invariants are conditions that can be relied on to be true during the execution of a program. This thesis aims to develop approaches to automatically mine numerical invariants from data produced by programs and use them to improve software reliability. We focus on two types of numerical invariant: the metamorphic relations (MRs) mined from program inputs and outputs, which can be used to assist bug detection in software testing; and the workflow relations mined from log data, which can be used to monitor software status and detect anomalies. For numerical invariants mined from program inputs and outputs, we propose a general method, AutoMR, to automatically infer and cleanse MRs. The proposed approach can infer both equality and inequality MRs, and MRs of linear, quadratic, and even higher degrees. AutoMR employs a general parameterization of arbitrary polynomial MRs and adopts the particle swarm optimization technique to search for suitable parameters. It also uses matrix singular-value decomposition and constraint-solving techniques to cleanse the MRs by removing redundancy. We apply the approach to 37 numerical programs and evaluate the fault-detection capacity of the inferred MRs. The results show that AutoMR can effectively infer various types of MR, which can be used successfully to detect faults in mutation testing and differential testing. For numerical invariants from program logs, we design two approaches, sADR and uADR, to handle semi-supervised and unsupervised scenarios, respectively. First, we propose a novel semi-supervised method, sADR, which requires a very small size of normal logs to extract the numerical invariants. Anomalies can be detected by evaluating whether or not the logs violate the mined invariants. Considering that labeling logs is time-consuming and tedious work, we design a novel minimal-rank-based sampling technique that takes advantage of the rank difference between the event-count-matrices of normal and abnormal log sequences. The sampling method can help select a number of seeds, which are used to mine likely numerical invariants. We evaluate the proposed approaches on three public datasets and manage to mine numerical invariants (workflow relations) from logs, which can be used to detect system anomalies effectively. In summary, this thesis develops approaches to automatically mine the two types of numerical invariants, and conducts experiments to evaluate their applications for software reliability improvement. The relations mined from program inputs and outputs can be used to assist bug detection in software testing, and those mined from logs can be used to monitor software status and detect anomalies.

History

Year awarded

2022

Thesis category

  • Doctoral Degree

Degree

Doctor of Philosophy (PhD)

Supervisors

Hongyu, Zhang (University of Newcastle); Pablo, Moscato (University of Newcastle)

Language

  • en, English

College/Research Centre

College of Engineering, Science and Environment

School

School of Information and Physical Sciences

Rights statement

Copyright 2022 Bo Zhang

Usage metrics

    Theses

    Categories

    No categories selected

    Exports

    RefWorks
    BibTeX
    Ref. manager
    Endnote
    DataCite
    NLM
    DC