INDEX
Explanations
The neuron activates on occurrences of the word “control.”
New Auto-Interp
Negative Logits
sage
-0.08
-May
-0.07
passage
-0.07
May
-0.07
eleven
-0.07
května
-0.06
page
-0.06
Whenever
-0.06
าษ
-0.06
srpna
-0.06
POSITIVE LOGITS
control
0.16
Control
0.16
Control
0.15
control
0.15
controls
0.12
Controls
0.11
(Control
0.11
_control
0.11
CONTROL
0.11
.control
0.11
Activations Density 0.065%