INDEX
    Explanations

    references to instructions and guidelines

    New Auto-Interp
    Negative Logits
     Kuz
    -0.82
     harem
    -0.80
     $("<
    -0.78
     Nemesis
    -0.78
     Nema
    -0.78
     كومونز
    -0.77
     Neve
    -0.77
    Kuz
    -0.77
     _("
    -0.76
     HAV
    -0.75
    POSITIVE LOGITS
     instructions
    2.28
     Instructions
    2.03
     instruction
    2.03
    instructions
    1.87
    Instructions
    1.85
     Instruction
    1.83
     instructed
    1.73
    Instruction
    1.73
     INSTRUCTION
    1.72
     instruct
    1.68
    Act Density 0.047%

    No Known Activations