INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     Powell
    -0.08
    oust
    -0.07
     Surgical
    -0.07
    テレビ
    -0.07
     Gon
    -0.07
    行动
    -0.06
     punch
    -0.06
    izens
    -0.06
    γου
    -0.06
    افع
    -0.06
    POSITIVE LOGITS
     baktı
    0.06
    /manual
    0.06
     человеч
    0.06
     IOC
    0.06
    ("^
    0.06
     advertis
    0.06
     Pasta
    0.06
     olduğ
    0.06
    loo
    0.05
    /testify
    0.05
    Act Density 0.033%

    No Known Activations