INDEX
Explanations
terms related to politics, law, and international affairs
references to regulations and political structures in society
New Auto-Interp
Negative Logits
'';
-0.74
'?
-0.73
olulu
-0.73
!:
-0.73
!'
-0.69
':
-0.66
!",
-0.61
raq
-0.61
spokeswoman
-0.59
Toledo
-0.58
POSITIVE LOGITS
outweigh
1.19
outwe
1.01
amounted
0.84
exceeds
0.81
constitutes
0.80
would
0.75
violates
0.74
ought
0.74
should
0.72
could
0.71
Activations Density 0.816%