INDEX
Explanations
phrases or sentences indicating permission or enabling of actions
instances of the word "allow" and its variations
New Auto-Interp
Negative Logits
borough
-0.80
xon
-0.72
bard
-0.69
bon
-0.68
enegger
-0.68
need
-0.67
kaya
-0.66
nard
-0.63
leaf
-0.62
bons
-0.61
POSITIVE LOGITS
us
0.83
Reviewer
0.82
me
0.73
ipient
0.69
passers
0.67
them
0.67
him
0.67
rapists
0.66
ANCE
0.66
ances
0.66
Activations Density 0.063%