INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     
    -0.07
     motorcycle
    -0.07
     zda
    -0.07
     comfy
    -0.07
    648
    -0.06
    FIX
    -0.06
    prices
    -0.06
     Corm
    -0.06
    یم
    -0.06
     Prototype
    -0.06
    POSITIVE LOGITS
    bins
    0.07
    ّت
    0.07
    ucht
    0.07
     člán
    0.07
    brıs
    0.07
    achelor
    0.06
    eson
    0.06
    nell
    0.06
    ряд
    0.06
    0.06
    Act Density 0.040%

    No Known Activations