INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     tight
    -0.08
    -0.08
    uzo
    -0.07
    पुर
    -0.07
     आस
    -0.07
     жүргіз
    -0.07
     pizza
    -0.07
     scaling
    -0.07
     glazing
    -0.07
     crisp
    -0.07
    POSITIVE LOGITS
    QUERY
    0.08
     harmless
    0.08
     extremist
    0.08
     spos
    0.08
     dolg
    0.07
    STR
    0.07
     Strange
    0.07
    VEVENT
    0.07
     сущ
    0.07
     معنى
    0.07
    Act Density 0.003%

    No Known Activations