INDEX
    Explanations
    No Explanations Found
    New Auto-Interp
    Negative Logits
    ahn
    -0.15
    901
    -0.14
    857
    -0.14
    ibus
    -0.14
     litter
    -0.14
    ohan
    -0.14
    ochen
    -0.13
    517
    -0.13
    lik
    -0.13
    <<<
    -0.13
    POSITIVE LOGITS
    hek
    0.15
    atsu
    0.15
    ifacts
    0.15
    iset
    0.15
    eced
    0.14
    strup
    0.14
    atab
    0.14
     Nah
    0.14
    avel
    0.14
    icens
    0.14
    Act Density 0.000%

    No Known Activations

    This feature has no known activations.