INDEX
Explanations
references to promotional materials or advertising elements
New Auto-Interp
Negative Logits
sWith
-0.39
sel
-0.38
sm
-0.36
sin
-0.35
side
-0.35
sp
-0.35
sc
-0.34
sis
-0.34
sid
-0.34
sen
-0.33
POSITIVE LOGITS
idge
0.35
er
0.31
ë§ģ
0.30
ific
0.29
gebn
0.29
ized
0.28
cury
0.28
ed
0.28
lain
0.27
most
0.26
Activations Density 0.732%