INDEX
Explanations
phrases or expressions indicating a contrast or contradiction
phrases that contrast common beliefs or expectations
New Auto-Interp
Negative Logits
estones
-0.81
ross
-0.77
ahead
-0.75
acht
-0.70
morning
-0.69
reau
-0.69
onna
-0.68
uben
-0.67
ocamp
-0.67
illed
-0.66
POSITIVE LOGITS
precon
0.96
stereotypes
0.94
expectations
0.90
stereotypical
0.88
stereotype
0.84
prevailing
0.80
perceptions
0.77
belief
0.74
conventional
0.73
conceptions
0.68
Activations Density 0.227%