INDEX
    Explanations

    mentions of something new or recently discovered

    the word "newly" and its variations

    New Auto-Interp
    Negative Logits
    ĺħ
    -0.75
    orem
    -0.75
    ocene
    -0.74
    raints
    -0.73
    rice
    -0.72
     sqor
    -0.72
    retty
    -0.69
     Citation
    -0.68
    ashtra
    -0.68
    igraph
    -0.68
    POSITIVE LOGITS
    wed
    1.01
    bie
    0.90
     appointed
    0.89
    ãĤ»
    0.82
     liberated
    0.82
    foundland
    0.80
    cedented
    0.79
     acquired
    0.79
     mint
    0.78
     dubbed
    0.78
    Act Density 0.016%

    No Known Activations