INDEX
    Explanations

    proper nouns, particularly names and titles

    New Auto-Interp
    Negative Logits
     spoof
    -0.74
    intage
    -0.69
    owship
    -0.69
    acebook
    -0.68
    ishers
    -0.68
    rawler
    -0.67
    achine
    -0.67
    arching
    -0.66
    oppers
    -0.65
    BILITIES
    -0.65
    POSITIVE LOGITS
     Oo
    0.71
    çIJ
    0.67
    imaru
    0.67
    å·
    0.66
    Wan
    0.66
    ãĥķãĤ©
    0.64
    loo
    0.64
     Auditor
    0.64
    oi
    0.63
    wo
    0.63
    Act Density 0.260%

    No Known Activations