INDEX
Explanations
references to political candidates and their party affiliations
New Auto-Interp
Negative Logits
rece
-0.17
ocities
-0.15
tw
-0.15
inkle
-0.15
Fat
-0.15
ienne
-0.15
Fat
-0.15
Trace
-0.14
Trace
-0.14
تÙĪ
-0.14
POSITIVE LOGITS
rita
0.16
žel
0.15
ween
0.15
585
0.14
lient
0.14
_nt
0.14
Strength
0.14
asso
0.14
itemap
0.14
bject
0.14
Activations Density 0.085%