INDEX
    Explanations

    words related to specific entities and actions in a diverse range of contexts

    New Auto-Interp
    Negative Logits
    theless
    -0.94
    atility
    -0.82
    hower
    -0.79
    ijah
    -0.69
     Bron
    -0.69
    atile
    -0.69
    nesday
    -0.68
    ternity
    -0.65
     Klux
    -0.64
     Kimber
    -0.63
    POSITIVE LOGITS
    ãĤĮ
    1.29
    ãģĻ
    1.28
    ãģĹ
    1.26
    ãģ
    1.10
    ãĤĵ
    1.02
    ãĤĭ
    1.01
    ãģª
    0.99
    çĶ
    0.93
    ãģ§
    0.93
    ãģ£
    0.93
    Act Density 0.009%

    No Known Activations