INDEX
Explanations
adjectives and nouns related to political views or actions
terms related to political and social critiques
New Auto-Interp
Negative Logits
lished
-0.75
izable
-0.71
shortened
-0.69
ually
-0.68
ized
-0.68
Rated
-0.67
ORED
-0.67
suspended
-0.67
ically
-0.65
FUL
-0.65
POSITIVE LOGITS
ieties
1.22
isms
1.17
acies
1.16
usions
1.14
ographies
1.10
izons
1.10
iances
1.08
tones
1.08
vironments
1.06
rities
1.06
Activations Density 0.561%