INDEX
    Explanations

    mentions of instructions and guidance in the text

    New Auto-Interp
    Negative Logits
     pleaf
    -0.63
    awtextra
    -0.61
     difp
    -0.60
     occaf
    -0.59
     بيها
    -0.58
     neceff
    -0.57
     fuper
    -0.56
     LCCN
    -0.56
     كومونز
    -0.56
    Controllo
    -0.54
    POSITIVE LOGITS
     instructions
    1.86
     directions
    1.66
     Instructions
    1.66
    instructions
    1.53
     INSTRUCTIONS
    1.46
    Instructions
    1.46
     Directions
    1.42
     DIRECTIONS
    1.38
    directions
    1.36
    Directions
    1.33
    Act Density 0.116%

    No Known Activations