INDEX
Explanations
No Explanations Found
New Auto-Interp
Negative Logits
452
-0.14
behavioural
-0.14
bounce
-0.14
202
-0.14
lyph
-0.14
leyen
-0.13
uffle
-0.13
PRINTF
-0.13
programme
-0.13
nodoc
-0.13
POSITIVE LOGITS
fucking
0.26
fuck
0.25
fuck
0.25
fucked
0.23
Fucking
0.23
FUCK
0.20
fucks
0.20
Fuck
0.20
Fuck
0.20
cunt
0.19
Activations Density 0.000%
No Known Activations
This feature has no known activations.