INDEX
Explanations
words related to alcohol consumption and its consequences, specifically focusing on instances of drunkenness
references to intoxication, particularly related to being drunk
New Auto-Interp
Negative Logits
Downloadha
-0.88
JPM
-0.78
Flavoring
-0.75
akeru
-0.72
ILA
-0.71
DonaldTrump
-0.70
isite
-0.70
adr
-0.68
metics
-0.67
ocol
-0.66
POSITIVE LOGITS
drunk
1.00
ards
0.95
bott
0.93
drinking
0.88
underage
0.84
cohol
0.83
ness
0.82
manslaughter
0.80
alcohol
0.79
binge
0.79
Activations Density 0.029%