INDEX
Explanations
mentions of a specific entity named "Dog" in various contexts
mentions of "Dog" in various contexts
New Auto-Interp
Negative Logits
Mb
-0.70
lished
-0.69
unders
-0.68
exch
-0.66
olate
-0.66
GOODMAN
-0.66
unres
-0.63
allowances
-0.63
CLOSE
-0.62
olated
-0.61
POSITIVE LOGITS
Dog
3.86
Dog
2.85
Dogs
2.51
dog
2.12
dog
2.11
dogs
1.91
canine
1.58
Pupp
1.46
Animal
1.42
dogs
1.41
Activations Density 0.012%