INDEX
    Explanations

    references to specific proper nouns, particularly names associated with crime or notable individuals

    New Auto-Interp
    Negative Logits
    SIGN
    -0.72
    iments
    -0.71
    undai
    -0.69
    poons
    -0.69
    emade
    -0.65
     disadvant
    -0.65
    ombs
    -0.64
    ottest
    -0.64
    igree
    -0.64
    aston
    -0.63
    POSITIVE LOGITS
    brush
    1.47
     grou
    1.03
    cliffe
    0.90
    sonian
    0.87
    lings
    0.87
     sage
    0.84
    ful
    0.80
     Advice
    0.79
    vana
    0.79
    fulness
    0.79
    Act Density 0.006%

    No Known Activations