INDEX
    Explanations

    punctuation marks, specifically colons, in the text

    punctuation marks or separators in the text

    New Auto-Interp
    Negative Logits
     behavi
    -0.94
    igue
    -0.79
    inement
    -0.76
    itol
    -0.73
    objects
    -0.72
    etheless
    -0.72
    ines
    -0.69
     tremend
    -0.68
    undai
    -0.68
     behav
    -0.67
    POSITIVE LOGITS
     TBD
    0.87
     Logged
    0.81
     Nom
    0.67
    76561
    0.66
     Yeah
    0.66
    ËĪ
    0.66
    fff
    0.64
     Who
    0.64
     Bye
    0.63
     Huh
    0.63
    Act Density 0.079%

    No Known Activations