INDEX
    Explanations

    short phrases or sentences expressing certainty or confidence

    punctuation marks and quotation marks in the text

    New Auto-Interp
    Negative Logits
     intentional
    -0.56
     planned
    -0.55
     automated
    -0.54
     stray
    -0.53
     planning
    -0.53
     advanced
    -0.52
     leve
    -0.51
     formally
    -0.51
     wildlife
    -0.50
     morp
    -0.49
    POSITIVE LOGITS
    ↵Âł
    0.88
    Otherwise
    0.85
    Therefore
    0.85
    Anyway
    0.81
    Thus
    0.79
    <|endoftext|>
    0.79
    Similarly
    0.79
    Likewise
    0.79
    Nevertheless
    0.78
    Moreover
    0.76
    Act Density 0.688%

    No Known Activations