INDEX
Explanations
references to religious themes and figures
New Auto-Interp
Negative Logits
İY
-0.20
/goto
-0.15
orno
-0.15
úa
-0.15
reau
-0.15
rell
-0.15
estroy
-0.14
(yy
-0.14
št
-0.14
_HAVE
-0.14
POSITIVE LOGITS
Who
0.37
Which
0.33
Who
0.33
qui
0.32
Which
0.30
who
0.29
Whoever
0.28
who
0.28
Qui
0.27
which
0.26
Activations Density 0.225%