INDEX
Explanations
mentions of asking questions or seeking information
repeated directives or questions
New Auto-Interp
Negative Logits
restraint
-0.70
supported
-0.66
onents
-0.66
dissip
-0.65
velop
-0.65
gross
-0.64
comple
-0.64
spl
-0.63
multiplier
-0.63
weapon
-0.63
POSITIVE LOGITS
Ask
3.48
Ask
3.28
AMA
1.44
ask
1.41
Answers
1.24
meet
1.16
Speak
1.16
Questions
1.15
Tell
1.10
Answer
1.05
Activations Density 0.010%