INDEX
Explanations
themes related to ideological conflicts, particularly critiques of modern ideological movements and their societal implications
New Auto-Interp
Negative Logits
letic
-0.20
rick
-0.15
982
-0.14
orde
-0.14
Bans
-0.14
.asp
-0.13
yyy
-0.13
lica
-0.13
licas
-0.13
Tough
-0.13
POSITIVE LOGITS
mas
0.21
masking
0.16
Mask
0.16
MAS
0.16
Mas
0.16
Mas
0.16
MAS
0.16
sickness
0.15
poisonous
0.15
infect
0.15
Activations Density 0.260%