INDEX
Explanations
mentions of the word "lion" along with a high activation value
repeated mentions of the word "lion."
New Auto-Interp
Negative Logits
mble
-0.89
ACTION
-0.80
chell
-0.79
ilk
-0.78
Ñı
-0.75
lying
-0.74
matter
-0.69
ETH
-0.66
ÑĮ
-0.66
skirts
-0.65
POSITIVE LOGITS
esses
1.25
fish
1.08
ess
1.00
eye
0.96
lions
0.96
ous
0.88
osaurs
0.85
odon
0.84
doms
0.84
iasis
0.83
Activations Density 0.018%