INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     Alu
    -0.69
     long
    -0.68
    oire
    -0.67
     AL
    -0.63
    navbar
    -0.61
    aure
    -0.60
    aus
    -0.60
    Margot
    -0.60
    𝓭
    -0.60
     Stoner
    -0.59
    POSITIVE LOGITS
    %?
    1.94
    ?!?
    1.77
    ?"
    1.65
    !?
    1.63
    ?
    1.63
    ’?
    1.60
    ?”
    1.58
    ?}
    1.57
    ?!
    1.55
    }?
    1.53
    Act Density 0.172%

    No Known Activations