INDEX
Explanations
phrases regarding attention and engagement
New Auto-Interp
Negative Logits
apo
-0.14
venir
-0.14
üçük
-0.14
omas
-0.14
887
-0.14
éal
-0.13
antu
-0.13
itespace
-0.13
ancial
-0.12
odyn
-0.12
POSITIVE LOGITS
attention
1.04
attention
0.88
Attention
0.84
Attention
0.76
attent
0.65
atención
0.63
_attention
0.63
внимание
0.61
注æĦı
0.55
attn
0.53
Activations Density 0.177%