INDEX
Explanations
The neuron activates on phrases referring to website “terms of service” or permissions/legal restrictions.
New Auto-Interp
Negative Logits
адження
-0.07
German
-0.07
арі
-0.07
WXYZ
-0.06
LLLL
-0.06
revise
-0.06
250
-0.06
ゲ
-0.06
Simon
-0.06
Knights
-0.06
POSITIVE LOGITS
-init
0.07
ляти
0.07
ocks
0.07
jal
0.06
sizing
0.06
SYS
0.06
i
0.06
postav
0.06
('');↵0.06
Ül
0.06
Activations Density 0.012%