INDEX
    Explanations

    acronyms and abbreviations

    New Auto-Interp
    Negative Logits
    T
    -0.28
    N
    -0.24
    S
    -0.22
    G
    -0.21
    D
    -0.20
    strup
    -0.19
    M
    -0.18
    TJ
    -0.17
    ripp
    -0.17
    TAB
    -0.16
    POSITIVE LOGITS
    O
    0.44
    OI
    0.27
    l
    0.23
    OGRAPH
    0.21
    t
    0.21
    OX
    0.19
    Oi
    0.19
    OE
    0.18
    s
    0.18
    din
    0.18
    Act Density 0.038%

    No Known Activations