INDEX
Explanations
phrases related to taking action or exerting control in a forceful manner
New Auto-Interp
Negative Logits
IMAGES
-0.77
lihood
-0.74
enegger
-0.73
abund
-0.68
Nations
-0.64
Values
-0.63
xual
-0.62
Plenty
-0.62
IVES
-0.62
Aires
-0.61
POSITIVE LOGITS
tered
1.53
tering
1.40
ters
1.07
down
1.04
tle
1.04
downs
1.02
down
0.99
outs
0.91
out
0.90
ulence
0.90
Activations Density 0.023%