INDEX
Explanations
words related to politics and leadership
expressions of hope and admiration
New Auto-Interp
Negative Logits
etc
-0.75
nude
-0.68
nudity
-0.65
Weird
-0.64
Originally
-0.64
entary
-0.61
oteric
-0.60
aceae
-0.60
ORPG
-0.60
vaguely
-0.60
POSITIVE LOGITS
leaders
0.89
coward
0.89
prag
0.87
courage
0.87
humility
0.80
embold
0.79
Failure
0.78
betrayal
0.78
cynicism
0.77
trust
0.74
Activations Density 0.768%