INDEX
    Explanations

    editorial notes at the end of written pieces

    New Auto-Interp
    Negative Logits
     Antar
    -0.70
     turtles
    -0.68
    llular
    -0.67
    avid
    -0.67
     Sicily
    -0.66
    omething
    -0.65
    metics
    -0.62
     squared
    -0.62
    astics
    -0.61
    fw
    -0.61
    POSITIVE LOGITS
    ial
    0.97
    icularly
    0.83
    iversary
    0.79
    ially
    0.77
    itative
    0.75
    ical
    0.75
     Picks
    0.75
    ickson
    0.74
    ificantly
    0.71
     Spoiler
    0.70
    Act Density 0.036%

    No Known Activations