INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     phen
    -0.07
    _EX
    -0.07
    _LOCK
    -0.06
    ahren
    -0.06
     شناخته
    -0.06
    irates
    -0.06
    TURN
    -0.06
    agnet
    -0.06
     будет
    -0.06
    OUN
    -0.06
    POSITIVE LOGITS
     charisma
    0.07
     midterm
    0.06
    (jj
    0.06
     punk
    0.06
    quo
    0.06
     gravid
    0.06
     com
    0.06
    Ì
    0.06
     concrete
    0.06
     [+
    0.05
    Act Density 0.002%

    No Known Activations