INDEX
Explanations
phrases related to socioeconomic and political topics
New Auto-Interp
Negative Logits
yss
-0.54
oret
-0.53
looph
-0.49
yip
-0.48
DonaldTrump
-0.48
©¶æ
-0.46
CLS
-0.46
lict
-0.45
Aval
-0.44
tiss
-0.44
POSITIVE LOGITS
accordingly
1.12
thereafter
1.05
thereof
1.01
.[
0.98
respectively
0.95
.(
0.93
attRot
0.92
therein
0.91
.
0.90
afterwards
0.88
Activations Density 1.299%