INDEX
    Explanations

    instances of the word "write" and its variations, indicating a focus on writing actions or commands

    New Auto-Interp
    Negative Logits
     Trop
    -0.58
    ganggu
    -0.53
    -0.50
     Cougars
    -0.50
     terecht
    -0.49
     Kang
    -0.48
    )++;
    -0.48
    orghini
    -0.48
    cope
    -0.47
     nemico
    -0.47
    POSITIVE LOGITS
     write
    1.74
    write
    1.66
    Write
    1.56
     Write
    1.55
     writing
    1.50
     Writing
    1.36
    Writing
    1.34
     WRITE
    1.33
    writing
    1.33
     writes
    1.32
    Act Density 0.126%

    No Known Activations