INDEX
Explanations
auxiliary verb or a noun following "the"
New Auto-Interp
Negative Logits
ة
1.48
ش
1.20
ف
1.09
Из
1.03
Κ
1.02
LikeLike
0.98
ت
0.96
Во
0.94
нные
0.94
zariaden
0.93
POSITIVE LOGITS
s
1.17
g
1.07
iv
1.05
pagina
1.02
ান
0.96
mselves
0.96
ுள்ளது
0.96
p
0.96
afirma
0.95
िन
0.94
Activations Density 0.261%