INDEX
Explanations
words related to substances or activities like drinking or drugs
references to drinking habits or beverages
New Auto-Interp
Negative Logits
yip
-0.88
eers
-0.86
SHIP
-0.78
quo
-0.73
eering
-0.68
Leilan
-0.66
eer
-0.63
hire
-0.62
Pose
-0.61
Economic
-0.59
POSITIVE LOGITS
inking
1.34
agons
1.30
inks
1.24
unk
1.21
ink
1.16
ags
1.15
ifting
1.14
agging
1.14
ifts
1.13
ifter
1.13
Activations Density 0.026%