INDEX
Explanations
references to fascism and related ideologies
New Auto-Interp
Negative Logits
izedName
-0.15
eper
-0.15
ERSHEY
-0.15
irez
-0.15
ey
-0.14
arks
-0.14
edList
-0.14
ables
-0.14
andering
-0.14
onces
-0.14
POSITIVE LOGITS
inating
0.27
fasc
0.25
Fasc
0.22
ination
0.21
inated
0.20
inate
0.20
ilit
0.19
inations
0.18
byss
0.17
icle
0.17
Activations Density 0.007%