INDEX
Explanations
unique cultural references and names related to specific locations or subjects
New Auto-Interp
Negative Logits
lar
-0.28
ìķĺ
-0.24
ra
-0.24
ca
-0.23
ìķĺëĭ¤
-0.23
ban
-0.23
va
-0.20
ça
-0.20
ja
-0.19
ba
-0.19
POSITIVE LOGITS
zet
0.28
iben
0.21
etty
0.20
ben
0.19
ye
0.19
dre
0.18
ül
0.18
де
0.18
ivel
0.17
inde
0.17
Activations Density 0.005%