INDEX
    Explanations

    terms related to various categories such as music, politics, crime, and different medical conditions

    terms related to crime, media, pop culture, and significant figures or events

    New Auto-Interp
    Negative Logits
    ingham
    -0.75
    ries
    -0.75
    hips
    -0.72
    nings
    -0.71
    ships
    -0.70
    icka
    -0.70
    rice
    -0.70
    liness
    -0.68
    older
    -0.68
    ls
    -0.67
    POSITIVE LOGITS
    ħĭ
    0.74
    ãĥŁ
    0.70
     metic
    0.63
    æ©
    0.59
    ãĥĻ
    0.58
    ãģı
    0.56
    ãĤ«
    0.53
     Juda
    0.52
    ĵĺ
    0.52
    éĹĺ
    0.52
    Act Density 0.511%

    No Known Activations