INDEX
    Explanations

    punctuation and questioning phrases

    New Auto-Interp
    Negative Logits
     درب
    -0.16
    mina
    -0.15
    DonaldTrump
    -0.15
    inerary
    -0.15
    evice
    -0.14
    imum
    -0.14
    lias
    -0.14
    adiator
    -0.14
    ì§Ģê°Ģ
    -0.14
    hari
    -0.14
    POSITIVE LOGITS
     Erd
    0.16
    ame
    0.15
     dec
    0.15
     Dort
    0.14
     rom
    0.14
     macro
    0.14
    uel
    0.14
    acy
    0.14
     Moran
    0.14
    ikt
    0.14
    Act Density 0.006%

    No Known Activations