INDEX
Explanations
words that indicate promises or assurances of satisfaction or quality
New Auto-Interp
Negative Logits
ToEnd
-0.15
arc
-0.14
oster
-0.14
.Selenium
-0.14
iler
-0.14
ild
-0.14
ington
-0.14
uy
-0.14
vvm
-0.14
ï½į
-0.13
POSITIVE LOGITS
ably
0.22
/prom
0.17
anteed
0.17
ing
0.16
ÅĽ
0.15
ainty
0.15
0.15
tesy
0.14
ean
0.14
ORTH
0.14
Activations Density 0.029%