INDEX
Explanations
No Explanations Found
New Auto-Interp
Negative Logits
DonaldTrump
-0.80
sterling
-0.71
ISC
-0.70
oulos
-0.69
virt
-0.68
iscons
-0.65
ambassador
-0.64
ollower
-0.64
utherford
-0.63
icy
-0.62
POSITIVE LOGITS
zeb
0.75
uda
0.72
qi
0.72
lined
0.69
NetMessage
0.69
aby
0.68
lines
0.68
mint
0.67
bee
0.66
directions
0.65
Activations Density 0.000%
No Known Activations
This feature has no known activations.