INDEX
Explanations
mentions or descriptors of different types of cheese
references to cheese
New Auto-Interp
Negative Logits
nant
-0.90
NEY
-0.72
eminent
-0.72
igious
-0.65
Citizens
-0.64
Latter
-0.63
Citizen
-0.62
Blueprint
-0.62
Walton
-0.61
Hep
-0.61
POSITIVE LOGITS
cheese
1.20
cloth
1.15
Cheese
1.04
bowl
0.94
chees
0.93
popcorn
0.91
fruit
0.91
weed
0.90
sauce
0.90
sandwich
0.90
Activations Density 0.009%