INDEX
Explanations
statements expressing strong opinions or beliefs
references to rights and the importance of equal treatment for all individuals
New Auto-Interp
Negative Logits
untled
-0.70
catentry
-0.69
Reported
-0.68
unexpectedly
-0.66
premature
-0.65
iatus
-0.64
Prompt
-0.63
senal
-0.63
anecd
-0.63
MpServer
-0.63
POSITIVE LOGITS
cannot
0.94
verning
0.93
therefore
0.89
belongs
0.87
shouldn
0.86
obey
0.82
belong
0.82
respective
0.80
uphold
0.79
arrog
0.77
Activations Density 0.644%