INDEX
Explanations
statements or beliefs related to education and societal issues
phrases that express belief or acceptance of authority
New Auto-Interp
Negative Logits
Regulation
-0.68
Scale
-0.64
forcement
-0.63
precedent
-0.61
timetable
-0.61
protocol
-0.60
Issue
-0.60
plate
-0.59
Scale
-0.59
legality
-0.57
POSITIVE LOGITS
themselves
1.17
selves
1.10
whom
0.94
selves
0.86
heirs
0.83
who
0.83
congreg
0.82
alike
0.81
backgrounds
0.80
deserving
0.80
Activations Density 1.111%