INDEX
    Explanations

    concepts related to effort and consequences

    New Auto-Interp
    Negative Logits
    ilver
    -0.14
    lesi
    -0.14
    nger
    -0.14
    ÑĤÑĥ
    -0.14
    undance
    -0.13
    prung
    -0.13
    858
    -0.13
    enger
    -0.13
     Frem
    -0.13
    lla
    -0.12
    POSITIVE LOGITS
     effort
    0.73
     efforts
    0.63
     Eff
    0.57
    eff
    0.48
    -eff
    0.48
    Eff
    0.43
    åĬªåĬĽ
    0.42
     eff
    0.38
    _eff
    0.38
     ÑĥÑģи
    0.37
    Act Density 0.122%

    No Known Activations