INDEX
Explanations
food-related terms, especially specific dishes or ingredients
New Auto-Interp
Negative Logits
Dialogue
-0.81
Democr
-0.75
Templ
-0.73
CTV
-0.69
FORE
-0.69
ALE
-0.68
Jol
-0.66
Charges
-0.66
Degree
-0.65
ige
-0.65
POSITIVE LOGITS
enegger
1.21
ming
1.21
intosh
0.99
tops
0.91
opian
0.90
bles
0.88
berries
0.88
eting
0.87
inx
0.86
rador
0.85
Activations Density 6.952%