INDEX
    Explanations

    words related to controversial or sensitive topics

    words related to specific characters or themes in storytelling

    New Auto-Interp
    Negative Logits
    aign
    -0.85
    atform
    -0.81
    iated
    -0.77
    rowth
    -0.76
    iations
    -0.75
    iation
    -0.75
    roups
    -0.75
    resh
    -0.74
    igor
    -0.73
    agne
    -0.73
    POSITIVE LOGITS
    ãĥĥãĥĪ
    0.78
     nuns
    0.77
    essee
    0.75
     paws
    0.70
     Pupp
    0.69
     Cly
    0.67
     chefs
    0.67
    eness
    0.67
    fix
    0.66
    culosis
    0.64
    Act Density 0.025%

    No Known Activations