INDEX
Explanations
names of political figures or entities
entities and roles related to individuals in a news context
New Auto-Interp
Negative Logits
respect
-0.73
DragonMagazine
-0.62
craving
-0.59
persecut
-0.57
aim
-0.57
Issue
-0.57
privile
-0.57
disproportion
-0.56
counterfeit
-0.56
.):
-0.56
POSITIVE LOGITS
told
1.38
tells
1.19
told
1.19
wrote
1.17
said
1.08
wrote
1.04
said
0.99
explained
0.97
tweeted
0.96
remarked
0.91
Activations Density 0.159%