INDEX
Explanations
references to significant religious figures and places
New Auto-Interp
Negative Logits
ulumi
-0.15
nea
-0.15
меÑĤÑĮ
-0.14
ç«
-0.14
reet
-0.14
KY
-0.14
phy
-0.14
Ñħод
-0.14
ovolta
-0.14
__$
-0.14
POSITIVE LOGITS
avit
0.16
uales
0.14
Spy
0.14
achat
0.14
uracy
0.14
aling
0.14
/ws
0.14
ardown
0.14
gnore
0.14
utter
0.14
Activations Density 0.069%