INDEX
    Explanations

    phrases related to behavioral change and modification

    New Auto-Interp
    Negative Logits
    esty
    -0.18
    Argb
    -0.16
    levance
    -0.14
    ÅĻez
    -0.14
    inding
    -0.14
    offee
    -0.14
    orget
    -0.14
    à¸Ńà¸ĩà¸Īาà¸ģ
    -0.14
    metry
    -0.14
    eka
    -0.13
    POSITIVE LOGITS
     behavior
    0.58
     behaviour
    0.52
     behaviors
    0.50
    è¡Į为
    0.48
     Behavior
    0.47
     habits
    0.46
     actions
    0.43
     behaviours
    0.42
    behavior
    0.42
     повед
    0.38
    Act Density 0.340%

    No Known Activations