INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    ane
    0.93
    ها
    0.79
    isting
    0.74
    le
    0.70
    irable
    0.70
    anka
    0.70
     άλλ
    0.67
    قف
    0.66
    ishes
    0.65
    ق
    0.65
    POSITIVE LOGITS
    ר
    1.09
    IS
    1.00
    GO
    0.97
    PE
    0.97
    MS
    0.94
    M
    0.92
    CO
    0.92
    CH
    0.91
    ㅋㅋ
    0.91
    0.89
    Act Density 0.007%

    No Known Activations