INDEX
    Explanations

    Expressing happiness

    New Auto-Interp
    Negative Logits
    challenge
    -0.08
    (batch
    -0.07
    factor
    -0.07
     enerji
    -0.06
    -muted
    -0.06
    _responses
    -0.06
     DEFIN
    -0.06
     Frage
    -0.06
    евер
    -0.06
     makin
    -0.06
    POSITIVE LOGITS
     rejo
    0.07
     plagiar
    0.06
     Angie
    0.06
    na
    0.06
    0.06
     exec
    0.06
    _VAL
    0.06
     Kim
    0.06
     Lith
    0.06
    ,无
    0.06
    Act Density 0.026%

    No Known Activations