INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    ade
    1.23
    I
    1.16
    link
    1.11
    that
    1.06
     a
    1.03
    H
    1.03
    ster
    0.97
    D
    0.97
    R
    0.97
    stone
    0.96
    POSITIVE LOGITS
    1.40
     stances
    1.10
     bạn
    1.07
     comien
    1.07
     stance
    1.05
    س
    1.04
    ى
    1.04
    م
    1.00
     важ
    0.98
     escre
    0.97
    Act Density 0.001%

    No Known Activations