INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    nger
    -0.19
    WithType
    -0.16
    outu
    -0.15
    .Undef
    -0.15
    onth
    -0.15
    OMPI
    -0.15
    863
    -0.15
    IIIK
    -0.15
    ngr
    -0.15
    nge
    -0.14
    POSITIVE LOGITS
    ik
    0.35
    iz
    0.30
    ig
    0.30
    ib
    0.29
    ip
    0.29
    il
    0.29
    id
    0.27
    if
    0.27
    im
    0.27
    it
    0.26
    Act Density 0.036%

    No Known Activations