INDEX
Explanations
politeness and related phrases
New Auto-Interp
Negative Logits
tle
1.16
tida
1.05
MO
1.03
ines
1.02
ilization
1.00
它
1.00
昉
0.99
oys
0.99
ibur
0.99
iya
0.98
POSITIVE LOGITS
sloppy
1.10
িন
1.02
ور
1.02
tranche
1.01
hurry
0.98
lousy
0.93
issance
0.91
defenseman
0.89
mouthful
0.89
a
0.88
Activations Density 0.028%