INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     RESULTS
    -0.19
     OUTPUT
    -0.17
    TRGL
    -0.17
     COMMENTS
    -0.17
    ERRQ
    -0.17
     INPUT
    -0.17
    EMPLARY
    -0.16
    _HC
    -0.16
     DESCRIPTION
    -0.16
    IFn
    -0.16
    POSITIVE LOGITS
    LOUR
    0.21
    ANNEL
    0.20
    LARI
    0.20
    ARTH
    0.20
    KEN
    0.20
    NEL
    0.20
    GLE
    0.20
    NIC
    0.19
     DA
    0.19
    BERT
    0.19
    Act Density 0.035%

    No Known Activations