INDEX
Explanations
alcohol-related words, particularly the word "liquor," with varying activations based on the context of the word in the document
references to alcoholic beverages and liquor-related terms
New Auto-Interp
Negative Logits
wered
-0.72
Hawk
-0.71
Aj
-0.69
DIR
-0.69
Canterbury
-0.65
pta
-0.65
Steps
-0.64
CFR
-0.63
Shogun
-0.62
Macro
-0.62
POSITIVE LOGITS
liquor
1.10
cohol
1.03
ocaust
0.96
beverage
0.89
licence
0.85
licenses
0.84
drinkers
0.83
alcohol
0.83
Liqu
0.82
liqu
0.81
Activations Density 0.005%