INDEX
Explanations
references to food and dishes, especially related to apples and desserts
New Auto-Interp
Negative Logits
itiveness
-0.86
rous
-0.70
Rub
-0.69
raq
-0.67
ivas
-0.66
jee
-0.65
roads
-0.63
Greek
-0.62
agos
-0.62
oshop
-0.62
POSITIVE LOGITS
opposed
1.32
well
1.08
ynchron
1.06
soon
1.02
part
0.97
part
0.94
pired
0.92
well
0.88
shown
0.86
semble
0.82
Activations Density 0.165%