INDEX
Explanations
phrases related to political figures, quotes, and political parties
New Auto-Interp
Negative Logits
ngth
-0.83
grades
-0.76
graded
-0.75
sed
-0.74
Ĥª
-0.73
LOAD
-0.71
jriwal
-0.69
bors
-0.68
sie
-0.66
llah
-0.65
POSITIVE LOGITS
Luther
1.48
ique
1.07
ucci
0.89
endez
0.89
ias
0.89
Sheen
0.89
McGu
0.82
etta
0.80
otti
0.79
etti
0.77
Activations Density 0.062%