INDEX
Explanations
mentions of cakes
references to cakes
New Auto-Interp
Negative Logits
WATCHED
-0.65
audi
-0.62
igious
-0.61
Powers
-0.60
vernment
-0.60
neighbouring
-0.60
Lomb
-0.58
nesota
-0.57
tested
-0.57
unknown
-0.57
POSITIVE LOGITS
cakes
1.21
cake
1.18
cake
1.14
cakes
1.03
ecake
0.93
meal
0.93
Cake
0.88
fruit
0.82
pillar
0.79
xual
0.75
Activations Density 0.010%