INDEX
    Explanations

    words related to specific cultural or religious references

    New Auto-Interp
    Negative Logits
    ding
    -0.78
    sburgh
    -0.78
    olson
    -0.76
    etsy
    -0.70
    assed
    -0.67
    intosh
    -0.66
    raltar
    -0.65
    lished
    -0.64
    iverpool
    -0.64
    igree
    -0.64
    POSITIVE LOGITS
     Dum
    0.73
    qi
    0.73
     Tao
    0.72
     Sabha
    0.72
    verse
    0.72
    plin
    0.71
    ze
    0.71
    iste
    0.70
    efully
    0.70
    jin
    0.69
    Act Density 0.005%

    No Known Activations