INDEX
    Explanations

    names of famous individuals

    references to well-known personalities, specifically actors or celebrities

    New Auto-Interp
    Negative Logits
    sic
    -0.80
    )."
    -0.69
    upon
    -0.59
     princ
    -0.57
     unimaginable
    -0.57
    ospace
    -0.56
    .""
    -0.55
    SourceFile
    -0.55
    espie
    -0.54
     whereabouts
    -0.54
    POSITIVE LOGITS
    ¶
    0.81
     Doesn
    0.79
    '?
    0.78
     Isn
    0.73
     Wouldn
    0.68
     Edit
    0.66
     Own
    0.63
    0.62
     Aren
    0.62
     Already
    0.61
    Act Density 0.626%

    No Known Activations