INDEX
Explanations
locations and spatial relationships in the text
New Auto-Interp
Negative Logits
именно
-0.15
*****/↵
-0.15
ä¸ĬäºĨ
-0.14
uality
-0.14
еÑħ
-0.14
plib
-0.14
íĥķ
-0.14
boro
-0.14
anch
-0.13
dÄ±ÅŁÄ±
-0.13
POSITIVE LOGITS
neath
0.21
/out
0.21
wards
0.21
them
0.19
ниÑħ
0.19
words
0.18
him
0.18
него
0.18
/left
0.18
ward
0.17
Activations Density 0.130%