INDEX
Explanations
references to legislative measures or financial terms
parentheses in the text
New Auto-Interp
Negative Logits
inward
-0.59
guard
-0.55
confront
-0.55
nic
-0.55
zoo
-0.55
caution
-0.54
cooker
-0.54
outweigh
-0.54
tid
-0.54
retreat
-0.53
POSITIVE LOGITS
which
1.54
including
1.52
both
1.41
such
1.33
whose
1.32
excluding
1.31
?),
1.28
perhaps
1.28
they
1.27
especially
1.27
Activations Density 0.135%