INDEX
Explanations
references to wolves and wolf-related characters or themes
New Auto-Interp
Negative Logits
Kew
-0.83
sth
-0.81
Eud
-0.79
Shet
-0.77
Réponses
-0.77
•••
-0.77
Bany
-0.77
νώ
-0.77
Jwt
-0.76
Edna
-0.76
POSITIVE LOGITS
Wolf
1.84
WOLF
1.67
wolf
1.60
Wolf
1.59
wolf
1.55
Wolfe
1.45
Wolves
1.44
Wolves
1.40
wolves
1.34
wolves
1.33
Activations Density 0.012%