INDEX
Explanations
references to sandwich-related terms
references to sandwiches
New Auto-Interp
Negative Logits
tics
-0.79
mberg
-0.75
ibr
-0.71
aft
-0.69
orne
-0.69
negative
-0.66
effic
-0.66
pmwiki
-0.66
axis
-0.65
umen
-0.63
POSITIVE LOGITS
sandwiches
1.15
sandwich
1.04
wich
1.01
bowl
0.91
salads
0.91
Sandwich
0.87
tray
0.80
anut
0.80
slices
0.80
salad
0.79
Activations Density 0.012%