INDEX
Explanations
phrases related to social interaction or communication
instances of words related to friendly interactions or social engagement
New Auto-Interp
Negative Logits
Bell
-0.66
acqu
-0.64
counter
-0.64
envis
-0.63
Ferr
-0.63
su
-0.62
patron
-0.62
Cross
-0.56
Dem
-0.55
cross
-0.55
POSITIVE LOGITS
atted
4.45
ats
1.47
":["
1.23
ioned
1.21
uffed
1.19
outed
1.13
atter
1.09
ouched
1.07
ated
1.06
ATS
1.05
Activations Density 0.018%