INDEX
Explanations
terms related to beverages
terms related to beverages
New Auto-Interp
Negative Logits
orse
-0.79
urat
-0.75
arity
-0.73
awar
-0.71
inel
-0.70
idency
-0.70
ures
-0.70
ure
-0.67
iger
-0.66
igers
-0.65
POSITIVE LOGITS
beverage
1.40
beverages
1.40
drinks
1.20
cohol
1.07
drinkers
1.06
drink
1.00
brewed
0.99
Drink
0.98
Bever
0.95
tasting
0.91
Activations Density 0.013%