INDEX
Explanations
references to tobacco-related products and names
New Auto-Interp
Negative Logits
isd
-0.17
anth
-0.16
ิà¹Ģศษ
-0.15
åħ¼
-0.15
jed
-0.15
.Toolkit
-0.15
еÑģÑĤи
-0.14
apr
-0.14
urs
-0.14
pty
-0.14
POSITIVE LOGITS
ább
0.20
acco
0.18
hiba
0.18
.LENGTH
0.16
istrovstvÃŃ
0.16
amak
0.16
пÑĢиÑĤ
0.15
yo
0.15
xic
0.15
gay
0.14
Activations Density 0.021%