INDEX
Explanations
The neuron activates on mentions of people moving countries—i.e. “immigrated” or “emigrated” (often split across tokens) and the following “to [country]” phrase.
New Auto-Interp
Negative Logits
stitch
-0.07
Control
-0.07
=!
-0.06
box
-0.06
anal
-0.06
_buffer
-0.06
cen
-0.06
Profile
-0.06
Пра
-0.06
понять
-0.06
POSITIVE LOGITS
}↵↵
0.07
};↵↵
0.06
tempt
0.06
澳
0.06
รถ
0.06
↵↵↵
0.06
})();↵↵
0.06
ght
0.06
ore
0.06
acceptable
0.06
Activations Density 0.011%