INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     raczej
    0.41
    uad
    0.40
    Global
    0.40
     скорее
    0.38
    😦
    0.38
    0.38
    嬉しい
    0.37
    Greetings
    0.36
    主な
    0.36
    Accordingly
    0.36
    POSITIVE LOGITS
     👌
    0.55
     overall
    0.54
     👍
    0.50
     good
    0.47
    整體
    0.46
     considering
    0.46
     хороший
    0.46
    overall
    0.45
     melhor
    0.44
    很好的
    0.44
    Act Density 0.015%

    No Known Activations