INDEX
Explanations
references to countries, states, political figures, and government-related topics
mentions of political entities and related organizations
New Auto-Interp
Negative Logits
sed
-0.71
urated
-0.66
named
-0.66
attr
-0.65
£ı
-0.64
lished
-0.64
è£
-0.62
_>
-0.62
mentioned
-0.61
>(
-0.61
POSITIVE LOGITS
needs
1.40
should
1.32
shouldn
1.29
cannot
1.28
lacks
1.28
deserves
1.27
ought
1.26
intends
1.25
must
1.21
needs
1.16
Activations Density 0.450%