INDEX
Explanations
references to the term "dog" and its variations
New Auto-Interp
Negative Logits
aurus
-0.18
æ¦ľ
-0.17
icks
-0.16
ków
-0.16
odian
-0.15
áÅĻ
-0.14
äre
-0.14
hots
-0.14
recep
-0.14
searchModel
-0.14
POSITIVE LOGITS
ged
0.30
gy
0.28
gie
0.24
matic
0.23
ma
0.22
gett
0.21
gone
0.21
ger
0.20
mat
0.20
bark
0.19
Activations Density 0.016%