INDEX
Explanations
phrases related to cats
mentions of cats
New Auto-Interp
Negative Logits
mble
-0.71
Seym
-0.71
indal
-0.68
htt
-0.68
gur
-0.68
enriched
-0.63
Vander
-0.62
Protestant
-0.62
Sacrament
-0.62
ij士
-0.62
POSITIVE LOGITS
aclysm
1.38
cat
1.03
heter
1.02
fish
1.01
cats
1.00
alogue
0.95
apult
0.93
Cat
0.88
cats
0.85
kitten
0.84
Activations Density 0.011%