INDEX
Explanations
No Explanations Found
New Auto-Interp
Negative Logits
.synthetic
-0.16
sth
-0.15
ocos
-0.15
ivel
-0.15
anax
-0.14
icus
-0.14
Franken
-0.14
elin
-0.14
amik
-0.14
eteria
-0.14
POSITIVE LOGITS
heard
0.34
å¤
0.32
hea
0.30
bead
0.30
head
0.29
-head
0.28
heard
0.28
head
0.28
ead
0.26
_head
0.26
Activations Density 0.085%