INDEX
Explanations
attention after 'of' or 's'
New Auto-Interp
Negative Logits
ৈত
0.63
हड्ड
0.62
խ
0.62
छु
0.61
仗
0.59
ক্যান্ট
0.58
]%
0.58
zsche
0.58
malle
0.58
शर्त
0.58
POSITIVE LOGITS
attention
3.56
Attention
3.19
attention
3.15
Attention
3.12
внимание
2.94
внимания
2.80
Aufmerksamkeit
2.79
attentions
2.78
attenzione
2.76
atenção
2.68
Activations Density 0.474%