INDEX
Explanations
references to democracy and democratic principles
New Auto-Interp
Negative Logits
ittle
-0.16
rase
-0.15
trách
-0.15
xor
-0.15
ses
-0.15
omo
-0.14
rij
-0.14
é¡¿
-0.14
Released
-0.14
erva
-0.14
POSITIVE LOGITS
anism
0.18
strains
0.16
gage
0.15
stration
0.15
Deg
0.15
erd
0.14
ically
0.14
intosh
0.14
Bris
0.14
лÑĮÑĤ
0.14
Activations Density 0.035%