INDEX
Explanations
references to secularism or secular institutions
New Auto-Interp
Negative Logits
ymoon
-0.18
utz
-0.18
ÃŃv
-0.16
ök
-0.15
лÑİ
-0.15
igans
-0.15
achie
-0.15
Wunused
-0.15
ysa
-0.15
lg
-0.14
POSITIVE LOGITS
urities
0.26
URITY
0.25
uring
0.25
ular
0.23
uencia
0.23
und
0.23
ession
0.22
ures
0.21
ularity
0.21
ured
0.21
Activations Density 0.009%