INDEX
Explanations
mentions of emotional states and interpersonal relationships
New Auto-Interp
Negative Logits
verity
-0.17
alar
-0.16
abile
-0.15
AtPath
-0.15
utsch
-0.15
egrity
-0.14
PCA
-0.14
annon
-0.14
utton
-0.14
ersistence
-0.14
POSITIVE LOGITS
view
0.17
believes
0.16
opinion
0.16
toler
0.16
belief
0.16
believe
0.16
ipi
0.15
believed
0.15
opper
0.15
mant
0.14
Activations Density 0.341%