INDEX
    Explanations

    high-frequency adverbs and conjunctions that indicate expectation or negation

    New Auto-Interp
    Negative Logits
    unge
    -0.18
    UGE
    -0.16
    ote
    -0.16
     haze
    -0.16
    itters
    -0.15
    ungan
    -0.15
    /pub
    -0.14
    Unary
    -0.14
    isms
    -0.14
    ibbon
    -0.14
    POSITIVE LOGITS
    icter
    0.16
    anager
    0.15
    prung
    0.15
    elik
    0.15
     Base
    0.14
     پاÛĮÙĩ
    0.14
    abase
    0.14
    Base
    0.14
    айд
    0.14
     Skipping
    0.14
    Act Density 0.001%

    No Known Activations