Recurrent Neural Network Architectures for Sequence Classification and Explanation

Ger, Stephanie Kwang-Yu

doi:https://doi.org/10.21985/n2-sc7x-9h35

Work

Recurrent Neural Network Architectures for Sequence Classification and Explanation

Public

Download PDF

Download All Files (.zip)

This thesis focuses on applications of recurrent neural networks (RNNs) for three aspects of sequential classification. In the first chapter, a novel method to generate synthetic minority data generation to improve imbalanced classification is discussed. Generative Adversarial Networks (GANs) have been used in many different applications to generate realistic synthetic data. We introduce a novel GAN with Autoencoder (GAN-AE) architecture to generate synthetic samples for variable length, multi-feature sequence datasets. This model consists of a GAN architecture with an additional autoencoder component, where RNNs are used for each component of the model in order to generate synthetic data to improve classification accuracy for a highly imbalanced medical device dataset. In addition to the medical device dataset, we also evaluate the GAN-AE performance on two additional datasets and demonstrate the application of GAN-AE to a sequence-to-sequence task where both synthetic sequence inputs and sequence outputs must be generated. To evaluate the quality of the synthetic data, we train encoder-decoder models both with and without the synthetic data and compare the classification model performance. We show that a model trained with GAN-AE generated synthetic data outperforms models trained with synthetic data generated both with standard oversampling techniques such as SMOTE and Autoencoders as well as with state of the art GAN-based models. Next, we discuss the applications of RNNs to partially ordered sequential data. Many models such as Long Short Term Memory (LSTMs), Gated Recurrent Units (GRUs) and transformers have been developed to classify time series data with the assumption that events in a sequence are ordered. On the other hand, fewer models have been developed for set based inputs, where order does not matter. There are several use cases where data at the uptake point is ordered but due to deficiencies in the processes some batches of data arrive unordered, resulting in partially ordered sequences. We introduce a novel transformer based model for such prediction tasks, and benchmark against extensions of existing order invariant models. We also discuss how transition probabilities between events in a sequence can be used to improve model performance. We show that the transformer based equal time model outperforms extensions of existing set models on three datasets. Lastly, we consider how to understand which features are important to sequential classification tasks. While many methods such as Locally Interpretable Model-agnostic Explanation (LIME), Integrated Gradients and Layerwise Relevance Propagation (LRP) have been developed to explain how recurrent neural networks make predictions, the explanations generated by each method often times vary dramatically. There is no consensus about which explainability methods most accurately and robustly determine features important for model prediction. We consider a classification task on a sequence of events and apply both gradient and attention based explanation models to compute explanations on the event type level. We implement a hierarchical attention model to compute explanations with respect to event type directly and show that attention based models return a higher similarity score between explanations for models initialized with different random seeds. However, there are still significant differences in explanations between model runs. We develop an optimization based model to find a low-loss, high-accuracy path between trained weights to understand how model explanations morph between different local minima. We use this low-loss path to provide insight as to why explanations vary on two sentiment datasets.

Creator