INDEX
Explanations
phrases related to confessions or discipline
statements of confession or admission of guilt
New Auto-Interp
Negative Logits
unison
-0.73
agonists
-0.65
%%
-0.64
oops
-0.63
UGC
-0.63
è£ħ
-0.62
skirts
-0.62
Whedon
-0.61
choes
-0.60
etheless
-0.60
POSITIVE LOGITS
himself
1.39
his
1.14
Himself
0.99
His
0.92
..."
0.91
his
0.90
[
0.88
.''
0.84
us
0.80
His
0.80
Activations Density 0.523%