INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     outcomes
    -0.08
     outcome
    -0.07
     recharge
    -0.07
     speaks
    -0.06
     soldiers
    -0.06
     succeed
    -0.06
     Outcome
    -0.06
    aktu
    -0.06
     Companies
    -0.06
     Comp
    -0.06
    POSITIVE LOGITS
    _STANDARD
    0.07
     Utilities
    0.07
    fell
    0.07
     DISCLAIMS
    0.06
    _execute
    0.06
     بیشتری
    0.06
     eylem
    0.06
    Hover
    0.06
    еним
    0.06
     Vampire
    0.06
    Act Density 0.150%

    No Known Activations