INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     neuron
    -0.07
    -0.06
    (lr
    -0.06
     canonical
    -0.06
    .tag
    -0.06
     öğ
    -0.06
    -0.06
     transporter
    -0.06
     Couple
    -0.06
     mitigate
    -0.06
    POSITIVE LOGITS
     Windows
    0.08
     WINDOWS
    0.08
    DES
    0.07
     Gates
    0.07
    UREMENT
    0.07
     winds
    0.07
    ат
    0.07
    .windows
    0.07
    pain
    0.07
    Windows
    0.07
    Act Density 0.012%

    No Known Activations