INDEX
    Explanations

    color codes in RGBA format

    New Auto-Interp
    Negative Logits
     happy
    -0.38
    -0.37
     breaking
    -0.36
     free
    -0.36
     unlike
    -0.36
     laughing
    -0.35
    🏿
    -0.35
     partly
    -0.34
    -0.34
     Ris
    -0.33
    POSITIVE LOGITS
     незавершена
    0.91
     autorytatywna
    0.81
     للمعارف
    0.71
     informée
    0.70
    IsContent
    0.67
    ScopeManager
    0.64
     cherchés
    0.61
     Autorizaciones
    0.60
     surla
    0.59
    weightedMode
    0.58
    Act Density 0.946%

    No Known Activations