INDEX
Explanations
phrases related to instructions and guidance
New Auto-Interp
Negative Logits
bish
-0.61
itals
-0.60
secution
-0.60
itol
-0.58
olson
-0.58
buck
-0.58
bomber
-0.57
76561
-0.56
martyr
-0.56
Spectre
-0.56
POSITIVE LOGITS
manuals
1.12
manual
1.03
booklet
1.02
Manual
0.87
instructed
0.84
instruct
0.82
instructions
0.82
Instruct
0.79
book
0.79
eering
0.77
Activations Density 0.052%