INDEX
Explanations
expressions of strong emphasis, particularly with the word "damn."
New Auto-Interp
Negative Logits
ins
-0.62
overall
-0.58
sz
-0.54
off
-0.54
autoreleasepool
-0.54
fair
-0.52
icoot
-0.52
sur
-0.52
emp
-0.52
try
-0.51
POSITIVE LOGITS
$_"
0.99
attacker
0.95
fucker
0.89
attacking
0.89
attackers
0.88
itſelf
0.81
ويكيميديا
0.80
reception
0.79
pleaſure
0.78
Efq
0.78
Activations Density 0.112%