INDEX
Explanations
verbs that suggest defiance or resistance
expressions that convey resistance or challenge to authority or established norms
New Auto-Interp
Negative Logits
spot
-0.80
reau
-0.76
enary
-0.75
anamo
-0.74
istan
-0.73
onna
-0.72
aning
-0.71
uzzle
-0.71
eport
-0.70
owe
-0.69
POSITIVE LOGITS
allegiance
1.02
precon
0.98
expectations
0.94
orthodoxy
0.91
gravity
0.90
belief
0.89
stereotypes
0.87
temptation
0.85
norms
0.85
tradition
0.85
Activations Density 0.272%