INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    นิด
    0.44
     הרב
    0.40
    整体
    0.39
    overall
    0.39
    整體
    0.38
     совсем
    0.38
    ሙሉ
    0.38
     약간
    0.37
     overall
    0.36
    じる
    0.36
    POSITIVE LOGITS
     better
    0.98
    更好的
    0.88
     bättre
    0.87
     worse
    0.83
     lepiej
    0.82
    better
    0.82
     bessere
    0.82
     BETTER
    0.79
     leps
    0.79
    更好
    0.78
    Act Density 0.005%

    No Known Activations