INDEX
Explanations
terms related to belief systems or ideologies, particularly those ending in 'ist'
New Auto-Interp
Negative Logits
er
-0.27
ity
-0.20
ed
-0.20
erli
-0.18
ITY
-0.16
anine
-0.16
thesis
-0.15
chw
-0.15
uliar
-0.15
jian
-0.15
POSITIVE LOGITS
ically
0.26
(ic
0.26
ische
0.21
ycz
0.21
otle
0.20
ical
0.18
ICAL
0.18
tir
0.18
-leaning
0.18
endencies
0.17
Activations Density 0.047%