INDEX
Explanations
instances of certain Russian nouns
New Auto-Interp
Negative Logits
s
-0.54
sut
-0.27
Ùĩ
-0.27
sheets
-0.24
sah
-0.24
Ñĭ
-0.24
sik
-0.22
sak
-0.21
sÃŃ
-0.21
à¸Ĺ
-0.20
POSITIVE LOGITS
ÅĽci
0.18
ÑĤеÑģÑĮ
0.16
naire
0.15
ONSE
0.15
itched
0.15
ÌĨ
0.15
Ñıм
0.15
Ñıми
0.15
cpy
0.14
cles
0.14
Activations Density 0.036%