INDEX
Explanations
references to tobacco brands and products
New Auto-Interp
Negative Logits
ford
-0.17
isd
-0.15
åħ¼
-0.15
naire
-0.14
anth
-0.14
Jerome
-0.14
wie
-0.14
t
-0.14
ollections
-0.14
pty
-0.14
POSITIVE LOGITS
hiba
0.24
ább
0.21
.LENGTH
0.20
acco
0.20
baru
0.19
plit
0.17
amak
0.16
Ïģκ
0.16
.makeText
0.16
xic
0.15
Activations Density 0.032%