INDEX
Explanations
references to snakes or snake-related themes
New Auto-Interp
Negative Logits
otron
-0.16
isay
-0.16
inger
-0.15
Umb
-0.15
aise
-0.15
534
-0.15
ume
-0.15
undry
-0.14
portun
-0.14
ounding
-0.14
POSITIVE LOGITS
sn
0.40
Sn
0.32
sn
0.32
/sn
0.31
(sn
0.30
.Sn
0.29
-sn
0.28
Sn
0.27
.sn
0.27
SN
0.27
Activations Density 0.016%