INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     posture
    -0.08
     RAND
    -0.08
     favor
    -0.08
    ally
    -0.07
    姿
    -0.07
    RAND
    -0.07
    rece
    -0.07
    PUT
    -0.07
     congr
    -0.07
     GEM
    -0.07
    POSITIVE LOGITS
    0.09
    0.08
     Make
    0.08
    طل
    0.08
     manche
    0.08
    .webp
    0.08
     закон
    0.07
    0.07
     esclarecer
    0.07
    0.07
    Act Density 0.042%

    No Known Activations