INDEX
Explanations
references to secularism or secular-related themes
New Auto-Interp
Negative Logits
utz
-0.18
ymoon
-0.18
akk
-0.17
ysa
-0.16
ÃŃv
-0.16
achie
-0.15
lobs
-0.15
iske
-0.15
ache
-0.14
bard
-0.14
POSITIVE LOGITS
URITY
0.24
urities
0.23
uring
0.23
ures
0.21
und
0.21
ular
0.20
ession
0.20
lected
0.19
ured
0.18
uencia
0.18
Activations Density 0.012%