INDEX
Explanations
terms related to fascism and authoritarianism
New Auto-Interp
Negative Logits
oku
-0.17
anda
-0.14
ãĥŃãĥ¼
-0.14
оÑģÑĤÑĥп
-0.14
ìĪł
-0.14
ukes
-0.13
ownik
-0.13
Casinos
-0.13
Affero
-0.13
Dee
-0.13
POSITIVE LOGITS
pta
0.15
rogram
0.14
legitim
0.14
ayar
0.14
erif
0.14
aire
0.14
250
0.14
leaning
0.14
aries
0.14
emory
0.14
Activations Density 0.017%