INDEX
Explanations
words expressing amazement or awe
expressions of wonder and admiration
New Auto-Interp
Negative Logits
Thieves
-0.70
gradient
-0.65
secondary
-0.65
FF
-0.63
containing
-0.63
dule
-0.63
Agents
-0.63
Solid
-0.62
Short
-0.62
Gamer
-0.61
POSITIVE LOGITS
awe
1.18
htaking
0.92
urous
0.91
aston
0.91
incred
0.87
amaz
0.86
ruciating
0.85
ingly
0.85
upe
0.84
amazed
0.82
Activations Density 0.013%