INDEX
Explanations
mentions of the word "dog" in various forms and contexts
New Auto-Interp
Negative Logits
autorytatywna
-0.59
umi
-0.50
mira
-0.50
umina
-0.47
AMI
-0.47
pert
-0.47
versa
-0.46
ini
-0.46
Rasa
-0.45
LLI
-0.45
POSITIVE LOGITS
Dog
2.17
dog
2.17
Dog
2.14
DOG
1.88
dog
1.74
DOG
1.74
dogs
1.47
Dogs
1.46
dogs
1.30
Dogs
1.28
Activations Density 0.006%