INDEX
Explanations
references to various types of alcoholic beverages
New Auto-Interp
Negative Logits
CHELL
-0.43
Att
-0.41
InputBorder
-0.40
Adapt
-0.40
邱
-0.40
gnore
-0.40
vicin
-0.39
Dat
-0.38
الحد
-0.38
AllowUser
-0.38
POSITIVE LOGITS
Whiskey
1.23
whiskey
1.22
Whiskey
1.15
Whisky
1.12
whisky
1.09
whiskey
1.07
whisky
0.86
bourbon
0.75
Bourbon
0.73
🥃
0.73
Activations Density 0.002%