INDEX
Explanations
No Explanations Found
New Auto-Interp
Negative Logits
usted
-0.93
agos
-0.88
henko
-0.79
riot
-0.78
isons
-0.74
atform
-0.74
ropolitan
-0.74
rals
-0.73
wegian
-0.72
cedented
-0.72
POSITIVE LOGITS
Streamer
0.78
ãĤ¶
0.69
Herrera
0.66
Clinton
0.65
ATE
0.65
970
0.64
å·
0.64
bias
0.62
debates
0.62
à¥
0.60
Activations Density 0.000%
No Known Activations
This feature has no known activations.