INDEX
Explanations
a or two
The neuron flags explicit “give a fuck”-style phrases—that is, profanity expressing an attitude of not caring.
New Auto-Interp
Negative Logits
泊
-0.07
trick
-0.06
MAR
-0.06
нак
-0.06
ا�
-0.06
-0.06
Với
-0.06
@↵↵
-0.06
_ASS
-0.06
Stan
-0.06
POSITIVE LOGITS
EVENT
0.07
gelen
0.07
gesch
0.07
Mountains
0.07
Memo
0.06
wij
0.06
Scientific
0.06
_handle
0.06
localObject
0.06
Tüm
0.06
Activations Density 0.003%