INDEX
Explanations
mentions of cats
mentions of cats
New Auto-Interp
Negative Logits
gur
-0.74
mble
-0.72
eous
-0.71
iosyncr
-0.69
ffiti
-0.69
Sachs
-0.67
indal
-0.67
undo
-0.67
Protestant
-0.64
Fellowship
-0.63
POSITIVE LOGITS
aclysm
1.36
heter
1.18
fish
1.08
chers
0.98
cat
0.97
cats
0.93
alogue
0.88
kittens
0.88
paws
0.86
cher
0.84
Activations Density 0.015%