INDEX
    Explanations

    expressions of preference or emphasis in speech

    New Auto-Interp
    Negative Logits
    oran
    -0.17
    ilor
    -0.16
    velt
    -0.16
    ales
    -0.14
    amos
    -0.14
    atories
    -0.14
    587
    -0.14
    apur
    -0.14
    uvre
    -0.14
    gress
    -0.13
    POSITIVE LOGITS
    anch
    0.15
    rif
    0.15
    apiro
    0.15
     Minority
    0.14
     teng
    0.14
    енко
    0.14
     modest
    0.13
    æ·
    0.13
     WikiLeaks
    0.13
    SAT
    0.13
    Act Density 0.002%

    No Known Activations