INDEX
    Explanations

    proper names, particularly surnames

    names and terms associated with specific political figures and entities

    New Auto-Interp
    Negative Logits
     Tsukuyomi
    -0.75
    Reviewer
    -0.71
     Metatron
    -0.70
     Curiosity
    -0.69
     tails
    -0.69
     Monarch
    -0.69
     jaws
    -0.69
     Archangel
    -0.68
     icing
    -0.67
     Merlin
    -0.67
    POSITIVE LOGITS
    kamp
    1.66
    ervative
    0.86
    liga
    0.86
    ervatives
    0.86
    ensation
    0.85
    stad
    0.82
    artisan
    0.82
    atism
    0.82
    sburg
    0.81
    fort
    0.79
    Act Density 0.013%

    No Known Activations