INDEX
Explanations
phrases related to critique or dissent
statements about policies and their impacts
New Auto-Interp
Negative Logits
OTUS
-0.70
rex
-0.68
ozo
-0.67
Tonight
-0.65
said
-0.64
eon
-0.64
daq
-0.63
estern
-0.62
obbies
-0.60
Pict
-0.60
POSITIVE LOGITS
unfairly
1.25
improperly
0.93
mishand
0.89
unfair
0.87
flawed
0.87
violates
0.86
inappropriately
0.85
insufficient
0.85
inadequate
0.81
violated
0.81
Activations Density 0.463%