INDEX
Explanations
phrases related to asserting influence or power
phrases indicating parts of speech or references to actions and states
New Auto-Interp
Negative Logits
xual
-0.95
nces
-0.82
upt
-0.81
iques
-0.76
illac
-0.71
worldly
-0.70
gars
-0.70
say
-0.70
ezvous
-0.69
speak
-0.69
POSITIVE LOGITS
stone
0.82
pig
0.82
coffin
0.76
sand
0.75
proverbial
0.73
donkey
0.69
coal
0.69
pear
0.68
sheep
0.63
pudding
0.63
Activations Density 0.207%