INDEX
Explanations
immediate context or surroundings
New Auto-Interp
Negative Logits
ä
1.38
’
1.23
olie
1.16
-
1.16
yl
1.16
üm
1.15
är
1.13
ICK
1.13
ř
1.13
olit
1.11
POSITIVE LOGITS
ع
1.72
ле
1.36
м
1.23
for
1.23
่า
1.17
ف
1.16
to
1.13
е
1.10
ки
1.07
ح
1.05
Activations Density 0.018%