INDEX
Explanations
words related to authority, control, and power dynamics
conjunctions and coordinating phrases indicating actions or conditions
New Auto-Interp
Negative Logits
Pony
-0.72
Flying
-0.71
Shots
-0.68
ETS
-0.67
Jets
-0.66
Hands
-0.66
Baseball
-0.66
ardi
-0.65
Lines
-0.65
Tok
-0.63
POSITIVE LOGITS
degrade
1.12
rogen
1.07
igmat
1.05
amplify
1.05
distort
1.04
humili
1.04
rogens
1.02
eliminate
1.02
discredit
0.99
automate
0.98
Activations Density 0.141%