INDEX
    Explanations

    name followed by punctuation

    New Auto-Interp
    Negative Logits
     hello
    0.78
     Hello
    0.70
    hello
    0.67
    Hello
    0.63
     greeting
    0.52
     안녕하세요
    0.52
     пожалуйста
    0.52
     lütfen
    0.51
     bonjour
    0.50
     please
    0.50
    POSITIVE LOGITS
    !(
    0.47
    }!
    0.44
    !।
    0.44
    0.43
    ![
    0.43
    .!
    0.43
    !!.
    0.42
    0.42
    !.
    0.41
    '!
    0.41
    Act Density 0.010%

    No Known Activations