INDEX
Explanations
historical references to religious practices
New Auto-Interp
Negative Logits
#af
-0.17
gii
-0.17
heimer
-0.16
Kurt
-0.16
riz
-0.16
gart
-0.15
Odin
-0.15
äter
-0.14
cks
-0.14
âĻ¥
-0.14
POSITIVE LOGITS
Jewish
0.28
Dead
0.23
Temple
0.22
Mish
0.22
Bilg
0.22
Sadd
0.21
Ess
0.20
Dead
0.20
Jews
0.20
jewish
0.20
Activations Density 0.042%