INDEX
    Explanations

    occurrences of the word "instructions" and its variations

    New Auto-Interp
    Negative Logits
     harem
    -0.81
     Neve
    -0.80
     Nema
    -0.79
     Gaps
    -0.78
     GAP
    -0.77
     Nemesis
    -0.76
     Kuz
    -0.76
     كومونز
    -0.75
    Kuz
    -0.74
    ponses
    -0.74
    POSITIVE LOGITS
     instructions
    2.65
     Instructions
    2.37
     instruction
    2.33
    instructions
    2.17
    Instructions
    2.13
     Instruction
    2.08
     instruct
    1.95
     instructed
    1.95
     INSTRUCTIONS
    1.93
    Instruction
    1.91
    Act Density 0.075%

    No Known Activations