INDEX
    Explanations

    occurrences of the word "New."

    New Auto-Interp
    Negative Logits
    Ļ
    -0.15
    apo
    -0.15
    compat
    -0.15
    829
    -0.15
    ainer
    -0.15
    жив
    -0.15
    δÏģο
    -0.14
    oner
    -0.14
    orean
    -0.14
    ampo
    -0.14
    POSITIVE LOGITS
     Zealand
    0.31
     York
    0.26
     Delhi
    0.24
     Orleans
    0.22
    castle
    0.22
     Scientist
    0.22
    ìļķ
    0.21
    chw
    0.20
    swire
    0.20
     york
    0.19
    Act Density 0.049%

    No Known Activations