INDEX
Explanations
phrases related to social interactions and personal identity
New Auto-Interp
Negative Logits
idata
-0.16
imu
-0.16
imers
-0.15
eka
-0.15
edith
-0.15
ESIS
-0.15
angi
-0.15
ysa
-0.15
)NULL
-0.15
jom
-0.14
POSITIVE LOGITS
rather
0.24
Rather
0.21
rather
0.21
Rather
0.20
instead
0.20
Freund
0.16
universal
0.16
abstract
0.15
Instead
0.15
determined
0.15
Activations Density 0.258%