INDEX
Explanations
phrases related to loss or unfavorable outcomes
New Auto-Interp
Negative Logits
losing
-2.05
Losing
-1.87
Losing
-1.84
loosing
-1.59
losing
-1.56
perdiendo
-1.06
loses
-1.00
perdre
-1.00
kehilangan
-0.94
perder
-0.93
POSITIVE LOGITS
käyt
0.36
wirk
0.31
ss
0.29
use
0.29
block
0.29
ity
0.29
verband
0.29
base
0.29
öny
0.28
fit
0.28
Activations Density 0.002%