INDEX
Explanations
mentions of alcoholic drinks
references to drinking beverages
New Auto-Interp
Negative Logits
Sharif
-0.68
ural
-0.66
Shant
-0.65
Brach
-0.64
Postal
-0.62
notch
-0.62
roe
-0.60
REM
-0.59
izons
-0.58
theless
-0.58
POSITIVE LOGITS
cohol
1.14
drinkers
1.10
bottles
1.05
Drink
1.02
beverages
0.97
water
0.95
drink
0.94
alcohol
0.92
drinks
0.91
bott
0.90
Activations Density 0.022%