INDEX
Explanations
descriptions or mentions of puppies
mentions of puppies or related terms in the text
New Auto-Interp
Negative Logits
ibur
-0.83
NetMessage
-0.74
aea
-0.74
ORGE
-0.73
lor
-0.73
Lank
-0.72
rooms
-0.69
abor
-0.68
ribe
-0.68
mberg
-0.67
POSITIVE LOGITS
puppy
1.11
puppies
1.02
Pupp
0.99
retri
0.97
kitten
0.83
pup
0.81
kittens
0.73
retrie
0.73
rador
0.73
Humane
0.72
Activations Density 0.016%