INDEX
Explanations
occurrences of the word "wolf" and related terms
New Auto-Interp
Negative Logits
agger
-0.23
iros
-0.17
abler
-0.16
寧
-0.15
abant
-0.15
velte
-0.14
incinn
-0.14
ora
-0.14
arrow
-0.14
/left
-0.14
POSITIVE LOGITS
owitz
0.23
enso
0.22
pack
0.22
enstein
0.22
enden
0.21
hound
0.21
endale
0.20
ishly
0.19
SSL
0.18
sonian
0.18
Activations Density 0.008%