INDEX
Explanations
references to sheep and goats
New Auto-Interp
Negative Logits
ãĥĥãĤ¯ãĤ¹
-0.16
nder
-0.15
MORE
-0.14
179
-0.14
mour
-0.14
_RB
-0.14
lh
-0.13
adro
-0.13
yar
-0.13
ÙĦاÙĦ
-0.13
POSITIVE LOGITS
alam
0.17
eko
0.16
innen
0.15
anship
0.15
eyed
0.15
eshire
0.15
é§
0.15
alian
0.14
els
0.14
ault
0.14
Activations Density 0.019%