INDEX
Explanations
references to specific ingredients in food-related content
food ingredients
New Auto-Interp
Negative Logits
NameInMap
-0.71
TagMode
-0.70
oa̍t
-0.61
increí
-0.59
RegressionTest
-0.59
témoig
-0.59
<end_of_turn>
-0.58
パンチラ
-0.58
Verſ
-0.58
beſti
-0.57
POSITIVE LOGITS
0.34
keb
0.31
season
0.31
ună
0.31
bog
0.29
im
0.29
considered
0.29
dene
0.28
Fly
0.28
Signature
0.28
Activations Density 0.011%