INDEX
Explanations
phrases related to physical proximity and direction
expressions related to emotional connections and interactions
New Auto-Interp
Negative Logits
distinction
-0.64
specificity
-0.61
oria
-0.58
rapport
-0.58
PN
-0.57
PID
-0.57
commissioners
-0.57
obser
-0.55
iness
-0.55
XX
-0.54
POSITIVE LOGITS
aneously
1.19
rely
0.86
edly
0.86
etheless
0.84
without
0.82
til
0.82
throughout
0.81
onto
0.77
ilaterally
0.77
alike
0.76
Activations Density 0.392%