INDEX
Explanations
mentions of Donald Trump
Donald Trump
New Auto-Interp
Negative Logits
=".
-0.42
__::
-0.41
(::
-0.40
ThroughAttribute
-0.39
CCS
-0.39
擅
-0.39
__":
-0.37
|')
-0.37
.');
-0.37
°.
-0.36
POSITIVE LOGITS
Trump
2.44
Trump
2.28
trump
1.73
trump
1.66
特朗普
1.38
ترام
1.26
Donald
1.05
Trumpet
1.01
Biden
0.98
trumpet
0.97
Activations Density 0.004%