INDEX
    Explanations

    occurrences of the word "new" and similar variations in context to novelty or change

    New Auto-Interp
    Negative Logits
    ingleton
    -0.16
    ety
    -0.15
    imson
    -0.14
    edith
    -0.14
    ฤ
    -0.14
    EIF
    -0.14
    erty
    -0.14
    hipster
    -0.14
    å®ļ
    -0.14
    tight
    -0.14
    POSITIVE LOGITS
    atak
    0.17
    ijd
    0.16
    utsch
    0.14
     ngữ
    0.14
    rint
    0.14
    anian
    0.14
     Niet
    0.14
     sice
    0.14
    ILLA
    0.14
     fare
    0.14
    Act Density 0.001%

    No Known Activations