INDEX
    Explanations

    risks versus rewards

    The neuron flags terms that quantify trade-offs or improvements—especially words denoting gains, benefits, or efficiency increases.

    New Auto-Interp
    Negative Logits
    .gg
    -0.06
    公共
    -0.06
    Game
    -0.06
    θρώ
    -0.06
    langs
    -0.06
     TestCase
    -0.06
     predominant
    -0.06
     Conversely
    -0.06
     precondition
    -0.06
    -0.06
    POSITIVE LOGITS
    modified
    0.06
     Fant
    0.06
     Keeping
    0.06
    mins
    0.06
     постоянно
    0.06
     excessive
    0.06
     ecstatic
    0.06
     fractional
    0.06
    _WARN
    0.06
    .boost
    0.06
    Act Density 0.065%

    No Known Activations