INDEX
Explanations
personal pronouns and expressions of refusal or decision-making
New Auto-Interp
Negative Logits
achs
-0.19
is
-0.17
pires
-0.17
ам
-0.16
Am
-0.15
Yesterday
-0.14
-Am
-0.14
Yesterday
-0.14
feed
-0.14
bere
-0.14
POSITIVE LOGITS
think
0.31
remember
0.27
think
0.26
guess
0.24
THINK
0.22
Think
0.22
Think
0.21
wish
0.21
suppose
0.21
'll
0.20
Activations Density 0.189%