INDEX
    Explanations

    repetitions of the word "this."

    New Auto-Interp
    Negative Logits
    agi
    -0.77
    ignt
    -0.77
    anamo
    -0.77
    hess
    -0.76
    RD
    -0.74
    aneers
    -0.73
    ARS
    -0.71
    ickets
    -0.70
    unk
    -0.68
    oller
    -0.68
    POSITIVE LOGITS
     week
    1.16
     weekend
    1.03
     year
    1.03
     latest
    0.99
     month
    0.97
     newest
    0.94
     century
    0.89
     morning
    0.86
     guy
    0.85
     decade
    0.85
    Act Density 0.158%

    No Known Activations