INDEX
Explanations
motivation
The neuron selectively fires on occurrences of the word “motivation” (and its subword variants).
New Auto-Interp
Negative Logits
creek
-0.07
black
-0.07
Ernst
-0.07
blocked
-0.06
zoo
-0.06
封
-0.06
lạnh
-0.06
scraping
-0.06
江
-0.06
crisp
-0.06
POSITIVE LOGITS
motivated
0.11
motivation
0.11
motivate
0.10
Mot
0.10
mot
0.10
motivational
0.10
motiv
0.10
Mot
0.10
motivating
0.09
motivations
0.08
Activations Density 0.012%