INDEX
Explanations
references to lions and the term 'lion' across various contexts
New Auto-Interp
Negative Logits
untime
-0.17
leston
-0.17
edy
-0.16
lov
-0.16
quisitions
-0.16
mer
-0.15
imer
-0.15
vip
-0.14
indr
-0.14
³
-0.14
POSITIVE LOGITS
ardo
0.22
ess
0.21
esses
0.21
mane
0.20
URRED
0.19
lion
0.19
Lion
0.19
cub
0.18
lions
0.18
ájem
0.17
Activations Density 0.014%