INDEX
Explanations
mentions of different colors and sizes of puppies
references to puppies
New Auto-Interp
Negative Logits
icted
-0.84
aining
-0.79
itable
-0.75
omb
-0.74
earing
-0.73
lor
-0.73
inez
-0.72
inger
-0.70
76561
-0.70
bestos
-0.70
POSITIVE LOGITS
mills
0.88
puppy
0.72
hirt
0.70
retri
0.68
chn
0.64
tant
0.62
ISO
0.62
hunter
0.61
coat
0.60
stick
0.60
Activations Density 0.047%