INDEX
Explanations
negative interactions and conflicts between characters
instances of body shaming and mockery in social interactions
New Auto-Interp
Negative Logits
contempl
-0.65
erning
-0.65
modesty
-0.64
suitable
-0.62
igmatic
-0.60
icip
-0.59
particularly
-0.57
gazing
-0.57
gaze
-0.56
igent
-0.55
POSITIVE LOGITS
fucked
0.89
didnt
0.88
THEN
0.87
gonna
0.86
blah
0.85
yeah
0.85
fuckin
0.85
uh
0.84
shit
0.81
Yeah
0.80
Activations Density 1.565%