INDEX
    Explanations

    phrases indicating variability or the existence of exceptions

    New Auto-Interp
    Negative Logits
    heimer
    -0.14
    ritz
    -0.14
    @student
    -0.14
     Offensive
    -0.14
    hawk
    -0.14
    581
    -0.14
    &
    -0.14
    sk
    -0.13
    alon
    -0.13
    ulos
    -0.13
    POSITIVE LOGITS
    conti
    0.16
    ازÙĦ
    0.15
    -либо
    0.15
     Weaver
    0.15
    ayar
    0.15
    place
    0.15
     particular
    0.14
    bÃŃr
    0.14
    ardi
    0.14
    icont
    0.14
    Act Density 0.058%

    No Known Activations