INDEX
Explanations
verbs referring to following steps or directions
references to instructions or guidance
New Auto-Interp
Negative Logits
olson
-0.72
ILCS
-0.71
xious
-0.69
itch
-0.69
martyr
-0.62
rowd
-0.59
riv
-0.59
itton
-0.59
olls
-0.59
vill
-0.58
POSITIVE LOGITS
instructions
0.98
booklet
0.98
manuals
0.96
eering
0.95
Instruct
0.86
auri
0.85
instruction
0.85
Instructions
0.83
confir
0.83
directions
0.82
Activations Density 0.035%