INDEX
Explanations
words related to criticism or scrutiny of authority figures, particularly in governmental or organizational contexts
New Auto-Interp
Negative Logits
anan
-0.62
fundament
-0.57
uli
-0.57
methodological
-0.55
Enhancement
-0.55
largeDownload
-0.55
motif
-0.52
Kings
-0.52
moratorium
-0.51
disclaimer
-0.51
POSITIVE LOGITS
anymore
1.33
anywhere
0.93
nor
0.92
necessarily
0.90
bothered
0.86
icable
0.80
yet
0.79
bother
0.78
any
0.78
remotely
0.77
Activations Density 10.887%