INDEX
Explanations
expressions related to personal beliefs or political statements
expressions of sincerity and personal commitment
New Auto-Interp
Negative Logits
wolves
-0.64
Patch
-0.63
themselves
-0.62
astical
-0.59
pricey
-0.58
pesky
-0.56
izens
-0.56
Pair
-0.55
Textures
-0.55
mutants
-0.55
POSITIVE LOGITS
myself
1.60
my
1.20
personally
0.95
My
0.79
privileged
0.79
unres
0.78
MY
0.77
My
0.76
am
0.76
hereby
0.74
Activations Density 0.699%