INDEX
    Explanations

    names of political figures and terms related to them

    proper nouns, particularly names and locations

    New Auto-Interp
    Negative Logits
    istically
    -0.74
    士
    -0.74
    åĬ
    -0.73
    ashtra
    -0.72
     Hastings
    -0.71
    icity
    -0.70
    EMBER
    -0.70
    RAW
    -0.69
    utherford
    -0.69
    OPLE
    -0.69
    POSITIVE LOGITS
    kens
    0.94
    wana
    0.77
    bles
    0.76
    kered
    0.75
    wash
    0.74
    bled
    0.74
    yip
    0.74
    virt
    0.71
    pload
    0.71
    bah
    0.71
    Act Density 0.031%

    No Known Activations