INDEX
    Explanations

    explanations or reasoning in a text

    New Auto-Interp
    Negative Logits
    Lma
    -1.24
    Ikr
    -1.23
    FTFY
    -1.21
    Lmfao
    -1.20
    
    
    -1.08
     uefa
    -1.06
    Noice
    -1.03
     sappi
    -0.99
    <?
    -0.97
    Yess
    -0.93
    POSITIVE LOGITS
     they
    0.71
     that
    0.67
     he
    0.62
     it
    0.62
     she
    0.60
     if
    0.60
     there
    0.59
     we
    0.57
     you
    0.57
     told
    0.57
    Act Density 0.309%

    No Known Activations