Back to Contents

Self-conditioning Pre-Trained Language Models

Suau, Zappella, Apostoloff, Apple

This paper identifies “expert” units (neurons) in the Transformer LM and “activates” them. It’s an intuitive idea, one that surely many people have been thinking of in terms of LM controllability. It starts out quite promising and I was expecting something simple but elegant and easy to implement but got thrown off more than the more “complicated” papers.

Background

Just by reading the abstract you would expect that a lot of work had been done in this area so a literature review of related work is particularly relevant here. However they only give one particularly relevant reference, Radford et al., 2017 (don’t get too excited, it’s not Radford Neal).1 who do “L1 regularisation of a logistic regression classifier on top of representations”. I feel like I’ve seen a lot more related work to identifying neurons just under a different name.

Method

I found the notation strangely unclear here and the method missing some implementation details for an ICML paper. For e.g., they write “A sentence $\mathbf{x} = \{x_i\}”$ (uh .. probably missing some subscript and superscripts here..) and later “Let $z_m^c = {z_{m,i}^c}^{N}_{i=1}$ be the outputs of neuron $m$ to sentences ${\mathbf{x}_i^c}$”. I’m pretty sure that the subscript on $\mathbf{x}$ was used to index a word in earlier notation, so I’m pretty confused by this point.

“We treat $\mathbf{z}^{c}_m$ as prediction scores for the task $\mathbf{b}^c$.” What is this task? Is it a binary prediction of whether $c$ is present at position $i$ from 1 to $N$? How did we get the “prediction scores” from a single neuron? Now a little tired, I gave up.

References

  1. Radford, Jozefowicz, Sutskever, 2017. Learning to generate reviews and discovering sentiment.