INDEX
Explanations
names of politicians or political figures and locations or news agencies
New Auto-Interp
Negative Logits
iasco
-0.72
udging
-0.70
footed
-0.66
minecraft
-0.64
agy
-0.63
Cyn
-0.61
enhagen
-0.61
aba
-0.60
pread
-0.59
iesta
-0.59
POSITIVE LOGITS
士
0.87
roth
0.69
·
0.69
grad
0.68
³
0.68
£
0.68
¶
0.67
Ĥİ
0.67
bilt
0.65
Higher
0.62
Activations Density 0.094%