INDEX
Explanations
phrases concerning personal responsibility and interpersonal relationships
New Auto-Interp
Negative Logits
SCI
-0.16
kova
-0.15
imus
-0.15
iosa
-0.15
Powers
-0.14
PWD
-0.14
ics
-0.14
>{!!-0.14
oder
-0.14
itch
-0.14
POSITIVE LOGITS
omor
0.18
anda
0.17
olean
0.15
itsu
0.14
ombine
0.14
è¯Ŀ
0.14
Daly
0.14
choice
0.14
ãĥ³ãĥģ
0.13
ész
0.13
Activations Density 0.140%