INDEX
Explanations
mentions of specific food items, particularly cereals
references to various types of cereals and grains
New Auto-Interp
Negative Logits
nesota
-0.69
affer
-0.65
lihood
-0.64
hare
-0.64
jury
-0.64
istance
-0.64
ount
-0.63
prosec
-0.63
CVE
-0.63
swer
-0.62
POSITIVE LOGITS
cereal
1.18
cere
1.08
grain
0.95
grains
0.88
biscuits
0.84
flav
0.83
Cere
0.81
beans
0.81
weed
0.80
stal
0.79
Activations Density 0.014%