INDEX
Explanations
ownership, nationality, concepts, groups
New Auto-Interp
Negative Logits
з
2.22
ী
2.14
сний
1.88
ร์
1.75
мо
1.74
ў
1.66
いた
1.66
ل
1.66
kinks
1.65
و
1.58
POSITIVE LOGITS
ي
2.45
de
2.38
den
2.33
gi
2.31
da
2.28
des
2.20
data
2.11
gel
2.05
y
2.05
gl
2.02
Activations Density 0.024%