INDEX
Explanations
mentions of the word "alcoholic", especially pertaining to people
terms related to addiction and loss of control
New Auto-Interp
Negative Logits
foundation
-0.83
ding
-0.79
arity
-0.74
bley
-0.74
imble
-0.73
artisan
-0.72
ĺħ
-0.70
eker
-0.69
ĪĴ
-0.69
tsky
-0.68
POSITIVE LOGITS
ABLE
0.90
uous
0.83
beverages
0.76
XVI
0.74
uations
0.73
involuntary
0.72
alcoholism
0.71
ãĥ¯ãĥ³
0.70
âĶĢâĶĢ
0.70
URA
0.69
Activations Density 0.023%