INDEX
Explanations
Russian negation particle "Не"
New Auto-Interp
Negative Logits
𖤍
-2.56
anderen
-2.44
.”
-2.42
Ꝑ
-2.42
arbeta
-2.38
ᨆ
-2.34
玘
-2.34
琊
-2.31
豋
-2.30
of
-2.28
POSITIVE LOGITS
ization
2.39
They
2.38
Не
2.28
↵
2.28
mereka
2.27
Ꭳ
2.25
身体
2.22
It
2.19
Ві
2.17
There
2.16
Activations Density 0.002%