INDEX
    Explanations

    negative phrases or words that express disagreement or denial

    New Auto-Interp
    Negative Logits
    olumn
    -0.19
    isoft
    -0.15
     eldre
    -0.15
     Äįe
    -0.15
     nám
    -0.15
    undan
    -0.14
    itore
    -0.14
    imity
    -0.14
    ominator
    -0.14
    ayout
    -0.14
    POSITIVE LOGITS
    ably
    0.37
    ing
    0.34
    withstanding
    0.34
    able
    0.33
    ed
    0.32
    ions
    0.28
     many
    0.27
    ion
    0.27
    icing
    0.27
     only
    0.26
    Act Density 0.032%

    No Known Activations