INDEX
    Explanations

    names of organizations

    proper nouns, particularly names and organizations

    New Auto-Interp
    Negative Logits
    orate
    -0.79
    izons
    -0.71
    arding
    -0.70
    urate
    -0.69
    arded
    -0.68
    iard
    -0.67
    acting
    -0.65
    oard
    -0.63
    raising
    -0.62
    uminati
    -0.62
    POSITIVE LOGITS
    plings
    0.82
    atchewan
    0.77
    eways
    0.77
    ustain
    0.76
    ority
    0.73
    utra
    0.73
    earcher
    0.73
    arin
    0.73
     Rough
    0.72
    ETH
    0.71
    Act Density 0.169%

    No Known Activations