INDEX
Explanations
mentions of people being drunk or drunk driving
instances of alcohol consumption and its consequences
New Auto-Interp
Negative Logits
Flavoring
-0.80
Downloadha
-0.79
JPM
-0.74
akeru
-0.72
pha
-0.68
DonaldTrump
-0.67
isite
-0.65
fasc
-0.65
semble
-0.64
Postal
-0.64
POSITIVE LOGITS
drunk
1.04
bott
0.98
manslaughter
0.92
ards
0.91
drinking
0.89
underage
0.88
cohol
0.87
binge
0.83
alcohol
0.80
drinkers
0.79
Activations Density 0.027%