INDEX
Explanations
phrases and terms related to drawing attention and engagement
New Auto-Interp
Negative Logits
887
-0.15
orman
-0.14
hist
-0.14
еÑī
-0.14
æ¨
-0.13
ела
-0.13
avicon
-0.13
æĬ±
-0.13
venir
-0.13
apo
-0.13
POSITIVE LOGITS
attention
0.97
attention
0.81
Attention
0.78
Attention
0.70
attent
0.60
atención
0.59
_attention
0.59
внимание
0.59
注æĦı
0.54
attn
0.50
Activations Density 0.133%