INDEX
Explanations
phrases related to authoritarianism and dictatorships
terms related to oppression and its effects
New Auto-Interp
Negative Logits
bys
-0.85
rog
-0.76
Freak
-0.75
vernment
-0.72
ces
-0.70
ereo
-0.69
blers
-0.69
Latvia
-0.66
soDeliveryDate
-0.66
busters
-0.65
POSITIVE LOGITS
ciating
0.90
desp
0.86
anguage
0.80
inement
0.80
iculty
0.77
endon
0.77
itious
0.77
phia
0.76
essage
0.72
igham
0.71
Activations Density 0.017%