INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    2
    1.26
    \*
    0.94
     \*
    0.90
    0.88
     *
    0.88
    *
    0.88
     twenty
    0.84
    0.83
    *,
    0.82
    ↵↵
    0.79
    POSITIVE LOGITS
    sType
    0.92
    take
    0.88
    romatic
    0.88
    rest
    0.84
    shed
    0.84
    0.83
    erci
    0.82
    0.82
    0.82
    sick
    0.82
    Act Density 0.020%

    No Known Activations