INDEX
    Explanations

    expressions of excitement or enthusiasm

    New Auto-Interp
    Negative Logits
    umin
    -0.17
    chers
    -0.17
    pire
    -0.17
    erge
    -0.15
    ings
    -0.15
    /post
    -0.14
    oden
    -0.14
    uges
    -0.14
    IRST
    -0.14
    iller
    -0.14
    POSITIVE LOGITS
    .testing
    0.18
    eneral
    0.14
     exciting
    0.14
    ibri
    0.14
    /power
    0.14
     Vul
    0.14
    rico
    0.14
    inand
    0.14
    ssc
    0.14
    ÑijÑĢ
    0.13
    Act Density 0.026%

    No Known Activations