INDEX
Explanations
words related to specific animals like penguins and tigers
references to specific animals or characters
New Auto-Interp
Negative Logits
erest
-0.86
lessly
-0.83
neau
-0.76
arnaev
-0.75
iggins
-0.74
bender
-0.71
urst
-0.69
mble
-0.69
ites
-0.67
staking
-0.65
POSITIVE LOGITS
Doodle
0.94
pengu
0.89
eday
0.79
cean
0.76
pige
0.73
Pengu
0.71
Penguin
0.69
retri
0.68
Britann
0.67
Alley
0.67
Activations Density 0.023%