INDEX
Explanations
praising or criticizing adjectives for describing individuals
phrases that emphasize positive attributes and characteristics
New Auto-Interp
Negative Logits
Events
-0.76
events
-0.75
eworks
-0.74
Attach
-0.73
Items
-0.73
articles
-0.72
views
-0.72
ARB
-0.71
IRC
-0.70
uden
-0.70
POSITIVE LOGITS
liar
1.28
hypocr
1.26
believer
1.19
coward
1.13
genius
1.11
terrific
1.09
brilliant
1.08
talented
1.08
bigot
1.07
fearless
1.07
Activations Density 0.138%