INDEX
Explanations
mentions of food and dining-related terms
references to food or dining options
New Auto-Interp
Negative Logits
ultan
-0.85
icus
-0.82
orial
-0.78
acca
-0.78
iversal
-0.70
izations
-0.70
ieth
-0.70
ulating
-0.69
interf
-0.69
ulkan
-0.69
POSITIVE LOGITS
fare
1.08
fares
1.01
ttes
0.94
fare
0.81
well
0.78
ptin
0.76
bill
0.74
rer
0.72
jit
0.72
ways
0.72
Activations Density 0.007%