INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    ADF
    -0.07
     Mills
    -0.07
    (cmp
    -0.07
    ickt
    -0.07
     doorway
    -0.06
     muff
    -0.06
    _mv
    -0.06
     א
    -0.06
    isty
    -0.06
     Fish
    -0.06
    POSITIVE LOGITS
    BeforeEach
    0.08
     Чтобы
    0.07
    -speed
    0.06
    uplic
    0.06
    unfold
    0.06
    س
    0.06
    utherland
    0.06
    ्षण
    0.06
    駅徒歩
    0.06
     Embedded
    0.06
    Act Density 0.010%

    No Known Activations