INDEX
    Explanations

    feelings, health

    New Auto-Interp
    Negative Logits
     happy
    -1.01
    happy
    -1.01
     Efq
    -0.98
     happiest
    -0.96
     happier
    -0.95
    HAPPY
    -0.90
    Happy
    -0.90
     Happy
    -0.88
     nahilalakip
    -0.86
     HAPPY
    -0.85
    POSITIVE LOGITS
    HideFlags
    0.47
    ям
    0.44
    ScopeManager
    0.44
     متعلقه
    0.44
    ret
    0.42
    dum
    0.40
    рек
    0.40
     lombok
    0.39
     with
    0.39
    ,
    0.39
    Act Density 0.057%

    No Known Activations