INDEX
    Explanations

    words related to civilized behavior and respectability

    language associated with social norms and civility

    New Auto-Interp
    Negative Logits
    burst
    -0.78
    thur
    -0.72
    oons
    -0.69
    arial
    -0.68
    wered
    -0.68
    oult
    -0.67
    anas
    -0.67
    falls
    -0.67
    pain
    -0.67
    scl
    -0.65
    POSITIVE LOGITS
     reputable
    0.88
     respectable
    0.84
     sane
    0.80
     citiz
    0.77
     @@
    0.77
    @@
    0.77
     bourgeois
    0.74
    abiding
    0.72
    Ô
    0.72
    Wiki
    0.72
    Act Density 0.027%

    No Known Activations