INDEX
    Explanations

    prepositions

    New Auto-Interp
    Negative Logits
     Twe
    -0.07
     edits
    -0.06
     num
    -0.06
     IW
    -0.06
    _auth
    -0.06
     Listener
    -0.06
    ‌دهد
    -0.06
    сут
    -0.06
     fear
    -0.06
    drink
    -0.06
    POSITIVE LOGITS
     csvfile
    0.07
    0.07
    oble
    0.07
     Rousse
    0.07
     Aeros
    0.07
    amines
    0.07
     Pope
    0.06
    ahas
    0.06
     epoxy
    0.06
    _sin
    0.06
    Act Density 0.009%

    No Known Activations