Using Clustered Random Samples to find Spurious Correlations in Neural Networks

One of the most difficult parts of successfully training a neural network is knowing if your data truly aligns with the pattern you're trying to capture, or if the dataset you're training on is representative of some other, unintended pattern. Knowing if such a pattern exists, and isolating the pattern if it does, is crucial to training any machine learning model, especially one with outcomes that have significant consequences - an increasingly common phenomenon.

I intend to create a novel algorithm with the ability to identify and isolate these spurious correlations. This algorithm, described approximately, would test the outputs of a trained neural network on a variety of samples theoretically unrelated to the task being trained. Then, these samples would be grouped by their features using their latent embeddings, and the distribution of categorizations as determined by the neural network would be analyzed, with a non-uniform distribution for a particular feature group indicating a potential correlation.

I plan to implement my algorithm using Python and TensorFlow, and test its ability to identify spurious correlations on a variety of models created with known spurious correlations, along with its ability to properly identify models known to be absent of significant spurious correlations. 

Comments