INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     persuasive
    -0.08
    下称
    -0.08
    approved
    -0.07
    rende
    -0.07
     conserve
    -0.07
    contri
    -0.07
     Angels
    -0.07
    שרת
    -0.07
    lands
    -0.07
     surrendered
    -0.07
    POSITIVE LOGITS
     Repeat
    0.08
     repeat
    0.08
     repeating
    0.08
     paraph
    0.08
     omit
    0.07
     метал
    0.07
     sık
    0.07
     Amit
    0.07
     teammate
    0.07
     decay
    0.07
    Act Density 0.029%

    No Known Activations