INDEX
Explanations
phrases emphasizing connections and interactions with others
New Auto-Interp
Negative Logits
etten
-0.16
elah
-0.16
rof
-0.15
-pills
-0.15
AZY
-0.15
fan
-0.14
ester
-0.14
OPSIS
-0.14
PROFILE
-0.14
_observer
-0.14
POSITIVE LOGITS
nhau
0.28
other
0.24
people
0.20
different
0.19
others
0.18
/about
0.18
other
0.18
them
0.18
him
0.18
lik
0.17
Activations Density 0.245%