INDEX
Explanations
phrases that convey varying degrees of change or transformation
New Auto-Interp
Negative Logits
iac
-0.17
{{{-0.16
iek
-0.15
zc
-0.14
StateChanged
-0.14
assen
-0.14
arl
-0.14
gain
-0.14
izu
-0.14
bes
-0.13
POSITIVE LOGITS
attention
0.18
chas
0.17
underway
0.17
dice
0.16
Attention
0.16
imits
0.16
ër
0.16
dice
0.16
Attention
0.15
rave
0.15
Activations Density 0.050%