INDEX
Explanations
references to cats and related content in the text
New Auto-Interp
Negative Logits
quito
-0.17
pup
-0.17
Mp
-0.15
wap
-0.15
puppy
-0.14
MP
-0.14
anske
-0.14
μÏĢ
-0.14
Pig
-0.14
rot
-0.14
POSITIVE LOGITS
cat
0.34
-cat
0.34
cats
0.33
Cat
0.32
Cats
0.30
/cat
0.27
Cat
0.27
(cat
0.27
çĮ«
0.26
kittens
0.26
Activations Density 0.034%