INDEX
Explanations
words following emphasized words
New Auto-Interp
Negative Logits
𝗗
1.13
ؘ
1.13
ছে
1.05
𒅗
1.05
肴
1.03
وعلى
1.01
piece
0.98
𝗙
0.98
ية
0.98
forerunner
0.96
POSITIVE LOGITS
ש
1.19
ed
1.18
'
1.16
ो
1.13
ena
1.06
1
1.06
!,
1.05
bzw
1.05
niej
1.05
ות
1.04
Activations Density 0.090%