INDEX
    Explanations

    phrases related to desserts and chocolate-based confections

    New Auto-Interp
    Negative Logits
    <unused8>
    -0.77
    <unused43>
    -0.76
    <pad>
    -0.76
    [@BOS@]
    -0.76
    <unused41>
    -0.76
    <unused28>
    -0.76
    <unused42>
    -0.76
    <unused74>
    -0.76
    <unused16>
    -0.76
    <unused23>
    -0.76
    POSITIVE LOGITS
     chocolate
    1.17
    Chocolate
    1.06
     Chocolate
    1.05
    chocolate
    1.02
     chocolates
    0.81
    OCOLATE
    0.81
     chocol
    0.77
     cocoa
    0.77
     шокола
    0.77
     cioccolato
    0.76
    Act Density 0.266%

    No Known Activations