INDEX
Explanations
phrases related to advising someone to take a specific action or make a decision
New Auto-Interp
Negative Logits
ament
-0.71
creen
-0.67
ificent
-0.66
eers
-0.65
itionally
-0.64
ullah
-0.64
oret
-0.60
Horus
-0.59
icio
-0.59
ifully
-0.59
POSITIVE LOGITS
vt
1.12
ggle
1.05
verning
1.01
lems
1.01
overboard
0.95
ALK
0.91
ahead
0.86
forth
0.83
unnoticed
0.83
ogly
0.82
Activations Density 0.095%