INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     messed
    -0.07
     loading
    -0.07
     kernels
    -0.07
     disgust
    -0.06
    collapsed
    -0.06
     intens
    -0.06
     coron
    -0.06
     бор
    -0.06
     bars
    -0.06
     BX
    -0.06
    POSITIVE LOGITS
    rious
    0.07
    steller
    0.06
    ใน
    0.06
     hairs
    0.06
     اهم
    0.06
    0.06
    ос
    0.06
    ován
    0.05
     stalk
    0.05
    0.05
    Act Density 0.001%

    No Known Activations