INDEX
    Explanations

    repeated references to the pronoun "it."

    New Auto-Interp
    Negative Logits
     swear
    -0.57
    aware
    -0.55
     useDispatch
    -0.53
    ]='\
    -0.52
    طيع
    -0.49
     sworn
    -0.49
     knew
    -0.49
    Controle
    -0.49
    _{[
    -0.49
     Orrell
    -0.49
    POSITIVE LOGITS
     is
    0.76
     consists
    0.76
     involves
    0.74
     occurs
    0.73
     occur
    0.71
    aarrggbb
    0.69
     consiste
    0.66
     comprises
    0.66
     comes
    0.65
     consist
    0.65
    Act Density 0.279%

    No Known Activations