INDEX
Explanations
the word "Wolf"
references to the word "Wolf" or its variations
New Auto-Interp
Negative Logits
bably
-0.88
pora
-0.76
thritis
-0.76
acists
-0.75
nces
-0.73
perture
-0.73
ursed
-0.72
acterial
-0.71
ngth
-0.71
conflic
-0.69
POSITIVE LOGITS
enstein
1.33
hound
1.25
sburg
1.06
gang
1.02
Wolf
1.01
Wolf
0.99
Wolves
0.98
pack
0.96
owitz
0.96
rider
0.94
Activations Density 0.018%