INDEX
Explanations
actions and phrases related to attention and engagement
New Auto-Interp
Negative Logits
hist
-0.15
alez
-0.15
avicon
-0.14
overe
-0.14
ела
-0.14
æĬ±
-0.13
ÑĩаÑĤ
-0.13
еÑī
-0.13
abra
-0.13
ANNEL
-0.13
POSITIVE LOGITS
attention
0.95
attention
0.80
Attention
0.78
Attention
0.69
atención
0.60
внимание
0.60
attent
0.59
_attention
0.59
ATT
0.52
注æĦı
0.52
Activations Density 0.171%