INDEX
Explanations
mentions of beverages
references to beverages
New Auto-Interp
Negative Logits
ural
-0.86
doms
-0.78
ures
-0.75
uve
-0.73
heed
-0.70
herty
-0.68
ebus
-0.68
cephal
-0.66
ure
-0.65
roe
-0.65
POSITIVE LOGITS
beverage
1.06
beverages
1.05
drinks
0.98
drinkers
0.97
gary
0.97
cups
0.93
brewed
0.89
drink
0.88
cohol
0.86
Sprite
0.85
Activations Density 0.026%