INDEX
    Explanations

    specific instructions or configurations in text

    occurrences of the word "this."

    New Auto-Interp
    Negative Logits
    lev
    -0.74
    ometown
    -0.68
    isms
    -0.66
    eteenth
    -0.65
    iberal
    -0.65
     Izan
    -0.63
    eming
    -0.63
    uther
    -0.62
    borne
    -0.61
    letters
    -0.60
    POSITIVE LOGITS
     wiki
    0.93
     latter
    0.86
     particular
    0.79
     diagram
    0.79
     topic
    0.79
     addon
    0.77
     webcam
    0.76
     endpoint
    0.76
     week
    0.75
     site
    0.75
    Act Density 0.205%

    No Known Activations