INDEX
    Explanations

    elements related to writing and communication

    New Auto-Interp
    Negative Logits
     Physical
    -0.16
    insky
    -0.16
     physical
    -0.16
    Physical
    -0.15
     phys
    -0.15
    _physical
    -0.15
    oved
    -0.15
    phys
    -0.14
    physical
    -0.14
    xn
    -0.14
    POSITIVE LOGITS
     writing
    0.31
     Writing
    0.30
    Writing
    0.27
    writing
    0.25
    -writing
    0.23
     writers
    0.23
     Writer
    0.22
     Writers
    0.21
     writer
    0.20
     Write
    0.19
    Act Density 0.189%

    No Known Activations