INDEX
Explanations
words related to ethics, politics, and criticism
topics related to social and cultural critique
New Auto-Interp
Negative Logits
stood
-0.74
imately
-0.74
cipled
-0.71
ificantly
-0.70
ually
-0.67
probable
-0.67
suspended
-0.66
secured
-0.66
authorised
-0.66
contracted
-0.66
POSITIVE LOGITS
tones
1.17
ieties
1.10
usions
1.07
isms
1.05
otypes
1.04
ographies
1.03
unctions
0.96
aunts
0.93
notations
0.93
izons
0.92
Activations Density 0.321%