INDEX
Explanations
specific names or terms related to current events or political figures
references to specific individuals or events within a narrative context
New Auto-Interp
Negative Logits
surprisingly
-0.52
aback
-0.50
ensibly
-0.50
prisingly
-0.49
partName
-0.49
inarily
-0.47
surprised
-0.47
cautiously
-0.47
spokeswoman
-0.47
surveyed
-0.46
POSITIVE LOGITS
)."
0.88
.'"
0.81
.).
0.81
)).
0.81
.''.
0.80
.�
0.79
.ãĢį
0.76
!".
0.76
'."
0.75
."[
0.74
Activations Density 1.480%