INDEX
Explanations
vandalism
phrases and words related to vandalism or acts of destruction.
The neuron activates on occurrences of the root “vandal,” i.e. words referring to vandalism (e.g. “vandalism,” “vandalized,” etc.).
New Auto-Interp
Negative Logits
...↵↵↵↵
-0.07
middleware
-0.06
_STOP
-0.06
trouver
-0.06
-0.06
ěř
-0.06
nový
-0.06
OUSE
-0.06
mời
-0.05
irect
-0.05
POSITIVE LOGITS
vandal
0.11
vandalism
0.10
graffiti
0.08
mutil
0.08
DIY
0.07
kid
0.07
القدم
0.07
Ze
0.07
GIF
0.07
188
0.07
Activations Density 0.002%