INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    (
    0.76
    0.68
     diuretic
    0.68
    ס
    0.67
     Stairs
    0.64
     سید
    0.64
     cenderung
    0.63
     abortions
    0.62
     دی
    0.61
    0.61
    POSITIVE LOGITS
    il
    0.89
    shirts
    0.79
    t
    0.78
    ير
    0.77
    shirt
    0.74
     camiseta
    0.74
    na
    0.73
    ij
    0.73
    ko
    0.73
     shirt
    0.73
    Act Density 0.006%

    No Known Activations