INDEX
Explanations
instances of the word "worst" and related variations
New Auto-Interp
Negative Logits
äºĪ
-0.17
mot
-0.16
geh
-0.15
eer
-0.15
eur
-0.15
mpl
-0.15
.pix
-0.14
nos
-0.14
sg
-0.14
kar
-0.14
POSITIVE LOGITS
shipping
0.30
ried
0.25
ris
0.25
thing
0.23
riers
0.21
SHIP
0.21
ship
0.21
ships
0.20
rell
0.19
zcze
0.19
Activations Density 0.003%