INDEX
Explanations
phrases or words related to social awkwardness
terms related to social discomfort or awkward situations
New Auto-Interp
Negative Logits
ptives
-0.85
ptive
-0.82
ignty
-0.79
ULTS
-0.71
idation
-0.71
vation
-0.70
tsky
-0.70
aders
-0.70
IVER
-0.70
FORE
-0.69
POSITIVE LOGITS
ness
1.39
nesses
1.07
ly
0.86
ety
0.86
ity
0.85
entimes
0.80
itude
0.77
NESS
0.77
awkward
0.76
kward
0.76
Activations Density 0.025%