INDEX
Explanations
words related to influence, responsibility, and social relationships
New Auto-Interp
Negative Logits
utin
-0.20
à¹Ģหล
-0.16
subtype
-0.15
zung
-0.15
pitch
-0.15
repid
-0.15
andas
-0.14
blinds
-0.14
AJ
-0.14
RICT
-0.14
POSITIVE LOGITS
271
0.15
.CommandType
0.14
Copp
0.14
Hakk
0.14
UNU
0.14
رÛĮÙģ
0.13
süt
0.13
iê
0.13
ä»¶
0.13
938
0.13
Activations Density 0.007%