INDEX
    Explanations

    phrases that indicate repetition or familiarity with ideas over time

    New Auto-Interp
    Negative Logits
     anymore
    -0.15
    obar
    -0.14
    aris
    -0.14
    .habbo
    -0.14
    urent
    -0.14
    ogn
    -0.13
    issan
    -0.13
    wp
    -0.13
    ilde
    -0.13
    ÃŃÅ¡
    -0.13
    POSITIVE LOGITS
     before
    0.59
     previously
    0.56
    before
    0.48
     Before
    0.45
    Before
    0.43
     elsewhere
    0.42
     antes
    0.40
     Previously
    0.39
    -before
    0.38
    Previously
    0.38
    Act Density 0.169%

    No Known Activations