INDEX
    Explanations
    No Explanations Found
    New Auto-Interp
    Negative Logits
    emia
    -0.70
    arta
    -0.65
     Mushroom
    -0.64
     IU
    -0.64
    abet
    -0.63
    rha
    -0.62
     Hipp
    -0.62
    ocus
    -0.61
     Peach
    -0.60
     heartbeat
    -0.60
    POSITIVE LOGITS
    drops
    0.83
    yip
    0.74
    packs
    0.74
    BUG
    0.69
    icides
    0.68
    handler
    0.65
    ignt
    0.65
     intervene
    0.64
    NG
    0.64
    thens
    0.64
    Act Density 0.000%

    No Known Activations

    This feature has no known activations.