INDEX
    Explanations

    words expressing comparisons and newness

    New Auto-Interp
    Negative Logits
     Augu
    -0.86
     bangkok
    -0.81
     stockholm
    -0.79
     venice
    -0.77
     Gorb
    -0.77
     Riti
    -0.76
     lidl
    -0.76
     fta
    -0.75
     linden
    -0.74
     Idem
    -0.74
    POSITIVE LOGITS
     rarely
    0.67
     never
    0.65
     hadn
    0.65
     seldom
    0.62
     lately
    0.61
     recently
    0.56
     until
    0.55
    never
    0.53
     hasn
    0.52
     haven
    0.51
    Act Density 0.232%

    No Known Activations