INDEX
Explanations
references to prominent animals, specifically tigers and eagles
New Auto-Interp
Negative Logits
omo
-0.16
orative
-0.16
tures
-0.15
619
-0.14
elay
-0.14
mates
-0.14
Bryce
-0.14
gia
-0.14
gae
-0.14
omes
-0.14
POSITIVE LOGITS
tail
0.18
hawk
0.18
innen
0.17
zsche
0.17
esses
0.16
Tales
0.16
æ¯Ľ
0.16
-shaped
0.15
oods
0.15
ůž
0.15
Activations Density 0.091%