INDEX
Explanations
phrases related to attention or awareness
New Auto-Interp
Negative Logits
hist
-0.15
747
-0.14
ILLA
-0.14
ÙħÙĪÙĦ
-0.13
redo
-0.13
ichni
-0.13
yme
-0.13
esus
-0.13
.INSTANCE
-0.13
chie
-0.13
POSITIVE LOGITS
attention
1.36
Attention
1.16
attention
1.16
Attention
1.04
atención
0.84
внимание
0.84
attent
0.83
_attention
0.80
вним
0.72
注æĦı
0.68
Activations Density 0.221%