INDEX
    Explanations

    expressions conveying uncertainty or disagreement

    yes or no, doubt, or exclamation

    New Auto-Interp
    Negative Logits
    [@BOS@]
    -0.75
    <unused17>
    -0.75
    <pad>
    -0.75
    <unused68>
    -0.74
    <unused42>
    -0.74
    <unused3>
    -0.74
    <unused14>
    -0.74
    <unused23>
    -0.74
    <unused16>
    -0.74
    <unused8>
    -0.74
    POSITIVE LOGITS
    !
    0.42
     OMITBAD
    0.35
     Yes
    0.33
     probably
    0.33
     indeed
    0.31
     surely
    0.31
    EMPTY
    0.30
     It
    0.30
     very
    0.29
    regler
    0.28
    Act Density 0.035%

    No Known Activations