INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    ZO
    -0.16
    fak
    -0.16
    fillType
    -0.15
    oldem
    -0.14
    agas
    -0.14
    .INT
    -0.14
    DefaultValue
    -0.14
    ä»ķ
    -0.14
     Blades
    -0.14
    ords
    -0.14
    POSITIVE LOGITS
     alt
    0.18
    933
    0.16
     harm
    0.15
     Res
    0.14
     Bad
    0.13
    ç´
    0.13
    alt
    0.13
     èī¯
    0.13
    ael
    0.13
    íĸī
    0.13
    Act Density 0.004%

    No Known Activations