INDEX
    Explanations

    mentions of locations and political figures

    punctuation marks, particularly periods

    New Auto-Interp
    Negative Logits
     NCT
    -0.73
    igo
    -0.73
    ãĥ¼ãĥĨãĤ£
    -0.66
    OY
    -0.63
    ola
    -0.62
    Output
    -0.62
    verbs
    -0.62
    oris
    -0.61
    onom
    -0.61
    Availability
    -0.60
    POSITIVE LOGITS
    uits
    0.75
    imentary
    0.67
    nesday
    0.65
    ternity
    0.63
    avorite
    0.63
    taboola
    0.61
    lication
    0.60
    adena
    0.60
     citing
    0.60
    DAQ
    0.59
    Act Density 0.041%

    No Known Activations