INDEX
Explanations
words and phrases related to humorous or absurd bodily functions.
The neuron is triggered by words describing game control schemes and input mechanics.
New Auto-Interp
Negative Logits
格
-0.07
amin
-0.07
,param
-0.06
cstdio
-0.06
pist
-0.06
artworks
-0.06
cable
-0.06
]!='
-0.06
�
-0.06
ourt
-0.06
POSITIVE LOGITS
?:
0.07
_REPO
0.06
роч
0.06
exclude
0.06
ScreenState
0.06
excludes
0.06
.squeeze
0.06
ucch
0.06
аза
0.06
OE
0.06
Activations Density 0.016%