INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    weapon
    -0.07
     Legislation
    -0.07
     Theresa
    -0.07
     sắt
    -0.06
    crease
    -0.06
    oload
    -0.06
    qrt
    -0.06
    ۰۰۰
    -0.06
     Bhar
    -0.06
    -pay
    -0.06
    POSITIVE LOGITS
     ICU
    0.13
    _buffers
    0.07
    ;↵
    0.06
     SAY
    0.06
    room
    0.06
    especially
    0.06
    YPRE
    0.06
    Ay
    0.06
     ryb
    0.06
    0.06
    Act Density 0.004%

    No Known Activations