INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    ीफ
    -0.07
     Syntax
    -0.07
    sex
    -0.07
     hue
    -0.07
    ltk
    -0.07
     ورد
    -0.06
    header
    -0.06
    <table
    -0.06
    plash
    -0.06
     wann
    -0.06
    POSITIVE LOGITS
     inaccurate
    0.07
    0.07
    _cost
    0.06
    جب
    0.06
     گفت
    0.06
     họp
    0.06
    iences
    0.06
    0.06
    0.06
    0.06
    Act Density 0.001%

    No Known Activations