INDEX
Explanations
references to religious or governance-related institutions and activities
New Auto-Interp
Negative Logits
elage
-0.18
blas
-0.17
ovice
-0.17
rý
-0.16
maries
-0.15
phis
-0.15
Mattis
-0.15
amber
-0.15
upt
-0.15
blade
-0.14
POSITIVE LOGITS
arians
0.20
adox
0.19
aguay
0.18
atic
0.18
ry
0.18
иÑĩно
0.16
443
0.16
ppers
0.16
atically
0.15
andise
0.15
Activations Density 0.050%