INDEX
    Explanations
    No Explanations Found
    New Auto-Interp
    Negative Logits
     layered
    -0.08
     Plane
    -0.08
     prediction
    -0.08
     protester
    -0.07
    etre
    -0.07
     penn
    -0.07
    משק
    -0.07
    -0.07
    iton
    -0.07
    aine
    -0.07
    POSITIVE LOGITS
     exhibits
    0.08
     об
    0.07
    Less
    0.07
    其次
    0.06
    0.06
     exhibited
    0.06
    0.06
    .""
    0.06
    KC
    0.06
    0.06
    Act Density 0.009%

    No Known Activations