INDEX
Explanations
expressions of gratitude and recognition
New Auto-Interp
Negative Logits
purse
-0.14
elor
-0.14
oulder
-0.14
ordon
-0.14
sympathy
-0.13
ìĶ
-0.13
tanggal
-0.13
zym
-0.13
enas
-0.13
зÑĥ
-0.13
POSITIVE LOGITS
privilege
0.17
uppe
0.15
ably
0.15
оÑģÑĮ
0.15
privileged
0.15
ÛĮدÛĮ
0.15
opportunity
0.14
εÏħ
0.14
Priv
0.14
priv
0.14
Activations Density 0.034%