INDEX
    Explanations

    negative phrases that express doubt or criticism

    New Auto-Interp
    Negative Logits
    chner
    -0.19
    æĹ¢
    -0.15
    unden
    -0.15
     بÛĮشترÛĮ
    -0.14
    λι
    -0.14
     ambos
    -0.14
    olid
    -0.14
    ELLOW
    -0.14
    99
    -0.14
    uren
    -0.13
    POSITIVE LOGITS
     particularly
    0.26
     anywhere
    0.25
     nearly
    0.23
     worth
    0.23
     going
    0.21
     what
    0.20
     very
    0.20
     gonna
    0.20
     terribly
    0.20
     remotely
    0.20
    Act Density 0.175%

    No Known Activations