INDEX
Explanations
references to dessert items
references to desserts
New Auto-Interp
Negative Logits
ought
-0.80
orne
-0.78
vernment
-0.71
aird
-0.69
reen
-0.69
sighted
-0.68
ne
-0.67
ostics
-0.67
away
-0.67
nesota
-0.67
POSITIVE LOGITS
essert
1.04
dessert
0.97
desserts
0.93
pudding
0.92
Dough
0.89
ecake
0.83
dough
0.82
cust
0.80
batter
0.79
pastry
0.79
Activations Density 0.026%