INDEX
    Explanations

    names of people or entities, particularly with a focus on political figures and institutions

    names of individuals or groups associated with political or social contexts

    New Auto-Interp
    Negative Logits
    er
    -1.03
    eros
    -0.97
    uras
    -0.92
    eric
    -0.87
    shire
    -0.87
    urus
    -0.86
    oise
    -0.84
    eur
    -0.82
    ersen
    -0.81
    uran
    -0.80
    POSITIVE LOGITS
    ãģĨ
    0.61
     à¨
    0.57
    å½
    0.56
    大
    0.55
    ãģ£
    0.54
     Ùħ
    0.52
    åį
    0.52
    èĥ
    0.51
    ÙĴ
    0.51
    ä¼
    0.51
    Act Density 0.237%

    No Known Activations