INDEX
Explanations
mentions of instructions and guidance in the text
New Auto-Interp
Negative Logits
pleaf
-0.63
awtextra
-0.61
difp
-0.60
occaf
-0.59
بيها
-0.58
neceff
-0.57
fuper
-0.56
LCCN
-0.56
كومونز
-0.56
Controllo
-0.54
POSITIVE LOGITS
instructions
1.86
directions
1.66
Instructions
1.66
instructions
1.53
INSTRUCTIONS
1.46
Instructions
1.46
Directions
1.42
DIRECTIONS
1.38
directions
1.36
Directions
1.33
Activations Density 0.116%