INDEX
Explanations
names related to a specific political party
references to political parties and affiliations
New Auto-Interp
Negative Logits
crow
-0.71
things
-0.70
scene
-0.68
erick
-0.68
stocks
-0.67
Icar
-0.67
lasses
-0.67
bearing
-0.66
where
-0.66
theless
-0.66
POSITIVE LOGITS
apolis
1.01
ya
0.98
hedral
0.95
endi
0.94
achment
0.94
elson
0.94
ña
0.91
opsis
0.89
qa
0.89
ached
0.86
Activations Density 0.018%