INDEX
Explanations
references to instructions and guidelines
New Auto-Interp
Negative Logits
Kuz
-0.82
harem
-0.80
$("<-0.78
Nemesis
-0.78
Nema
-0.78
كومونز
-0.77
Neve
-0.77
Kuz
-0.77
_("-0.76
HAV
-0.75
POSITIVE LOGITS
instructions
2.28
Instructions
2.03
instruction
2.03
instructions
1.87
Instructions
1.85
Instruction
1.83
instructed
1.73
Instruction
1.73
INSTRUCTION
1.72
instruct
1.68
Activations Density 0.047%