INDEX
    Explanations

    punctuation and quotation marks in the text

    New Auto-Interp
    Negative Logits
     Are
    -0.47
    gridy
    -0.40
    fromnode
    -0.38
     IF
    -0.37
    worfen
    -0.37
     but
    -0.37
     Or
    -0.36
    uxxxx
    -0.36
    gridx
    -0.36
     Be
    -0.35
    POSITIVE LOGITS
    is
    1.88
    has
    1.30
    was
    1.24
    will
    1.10
    can
    1.02
    would
    0.98
    may
    0.96
    does
    0.93
    should
    0.93
    are
    0.89
    Act Density 0.438%

    No Known Activations