INDEX
Explanations
negations and expressions of anticipation or eagerness
New Auto-Interp
Negative Logits
ortex
-0.15
lette
-0.14
ropy
-0.14
Ì£
-0.14
hoe
-0.14
erti
-0.14
apture
-0.14
allback
-0.14
ncy
-0.13
usp
-0.13
POSITIVE LOGITS
wait
0.24
stress
0.22
WAIT
0.21
believe
0.20
wait
0.20
stresses
0.19
imagine
0.19
/wait
0.18
Stress
0.18
waits
0.18
Activations Density 0.028%