INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    ی
    1.06
    ie
    1.05
    ной
    0.98
    ي
    0.98
    0.97
    ו
    0.97
    0.95
    iend
    0.93
    nice
    0.89
    0.86
    POSITIVE LOGITS
    ية
    1.29
     (
    1.20
     fan
    1.13
    ↵↵
    1.05
    ير
    1.01
     fans
    1.00
     Fans
    1.00
    ud
    0.97
    ский
    0.94
    -
    0.94
    Act Density 0.007%

    No Known Activations