INDEX
    Explanations

    phrases related to political figures, quotes, and political parties

    New Auto-Interp
    Negative Logits
    ngth
    -0.83
    grades
    -0.76
    graded
    -0.75
    sed
    -0.74
    Ĥª
    -0.73
    LOAD
    -0.71
    jriwal
    -0.69
    bors
    -0.68
    sie
    -0.66
    llah
    -0.65
    POSITIVE LOGITS
     Luther
    1.48
    ique
    1.07
    ucci
    0.89
    endez
    0.89
    ias
    0.89
     Sheen
    0.89
     McGu
    0.82
    etta
    0.80
    otti
    0.79
    etti
    0.77
    Act Density 0.062%

    No Known Activations