INDEX
Explanations
mentions of bread-related terms
references to bread
New Auto-Interp
Negative Logits
uate
-0.82
igious
-0.71
Cheong
-0.69
OWER
-0.67
NRS
-0.66
uated
-0.66
ORT
-0.66
ARP
-0.65
ULT
-0.65
IGHT
-0.65
POSITIVE LOGITS
fruit
1.11
Seym
1.07
bread
1.06
grain
1.04
dough
1.01
sticks
1.01
pudding
1.00
winner
0.97
oven
0.91
cake
0.90
Activations Density 0.008%