INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     commit
    -0.07
    медицин
    -0.07
     reprint
    -0.07
    いく
    -0.07
     condemning
    -0.07
     tand
    -0.07
    upon
    -0.07
    فجر
    -0.07
    kening
    -0.07
    elcome
    -0.07
    POSITIVE LOGITS
    (fr
    0.07
    _ASSOC
    0.07
    0.07
    Diff
    0.06
     Gl
    0.06
     Grass
    0.06
     sane
    0.06
    0.06
     область
    0.06
    _sw
    0.06
    Act Density 0.002%

    No Known Activations