INDEX
Explanations
words related to personal attributes and actions, particularly in contexts of relationships and roles
New Auto-Interp
Negative Logits
shields
-0.15
inish
-0.15
affected
-0.14
aye
-0.14
Shield
-0.14
affected
-0.14
мÑĥ
-0.14
฿
-0.14
\CMS
-0.13
fal
-0.13
POSITIVE LOGITS
ané
0.17
aroo
0.16
ahoo
0.16
éķ·
0.15
apol
0.15
Beat
0.15
arna
0.15
terdam
0.15
IMIT
0.15
beat
0.14
Activations Density 0.017%