INDEX
    Explanations

    Untrue/unfair claims

    New Auto-Interp
    Negative Logits
     work
    -0.06
     berlin
    -0.06
     setbacks
    -0.06
     convened
    -0.06
    ournaments
    -0.06
     stanza
    -0.06
    drawable
    -0.06
    .predicate
    -0.06
     Flag
    -0.06
    -operative
    -0.06
    POSITIVE LOGITS
     Solomon
    0.07
    erah
    0.06
     assembler
    0.06
     Fisher
    0.06
     đáp
    0.06
    .setContentType
    0.06
     objected
    0.06
     Buddh
    0.06
     Rajasthan
    0.06
    0.06
    Act Density 0.009%

    No Known Activations