INDEX
    Explanations

    words related to political figures or events

    names and references to specific individuals or characters, particularly those in political or entertainment contexts

    New Auto-Interp
    Negative Logits
    ãĤ¦
    -0.74
    lished
    -0.64
    ORTS
    -0.62
     mete
    -0.57
     Dino
    -0.57
    âĢ¢âĢ¢
    -0.55
    ãĥĨ
    -0.55
     feces
    -0.54
     yuan
    -0.54
     Rated
    -0.53
    POSITIVE LOGITS
    issan
    0.90
    mort
    0.88
    xton
    0.84
    ttle
    0.82
    pson
    0.78
    eus
    0.74
    essen
    0.74
     metic
    0.73
    teenth
    0.73
    ault
    0.70
    Act Density 0.069%

    No Known Activations