INDEX
Explanations
references to werewolves or related terms
New Auto-Interp
Negative Logits
o
-0.20
son
-0.18
a
-0.17
ing
-0.17
tons
-0.17
sb
-0.15
e
-0.15
ORG
-0.15
sg
-0.15
verte
-0.15
POSITIVE LOGITS
ewolf
0.29
ghi
0.18
wolf
0.17
kiye
0.16
ediator
0.16
ìķ½
0.16
eturn
0.15
ewith
0.15
kenin
0.15
ktop
0.15
Activations Density 0.012%