INDEX
Explanations
terms related to various philosophical and ideological positions
New Auto-Interp
Negative Logits
er
-0.22
izer
-0.17
nier
-0.16
an
-0.15
æĿī
-0.15
smith
-0.15
umas
-0.15
度
-0.15
anine
-0.15
thon
-0.15
POSITIVE LOGITS
(ic
0.22
ically
0.21
ische
0.21
-leaning
0.21
otle
0.20
isches
0.17
ycz
0.16
lero
0.16
иÑĩ
0.16
ischer
0.15
Activations Density 0.084%