INDEX
Explanations
No Explanations Found
New Auto-Interp
Negative Logits
ources
-0.73
olars
-0.65
fal
-0.65
auga
-0.64
disbanded
-0.62
Stard
-0.61
plex
-0.61
accur
-0.60
torch
-0.60
BST
-0.59
POSITIVE LOGITS
Sanders
0.87
Pref
0.74
Mubarak
0.71
é¾į
0.71
Democratic
0.70
Connector
0.68
wear
0.66
ãĥ¼ãĥĨ
0.65
Teach
0.65
vana
0.64
Activations Density 0.000%
No Known Activations
This feature has no known activations.