INDEX
    Explanations

    intentionality and deliberate action

    New Auto-Interp
    Negative Logits
    0.89
    0.84
    0.84
     Григо
    0.83
    もら
    0.82
    στη
    0.82
    0.81
    ရာ
    0.80
    در
    0.79
    ست
    0.78
    POSITIVE LOGITS
    ↵↵
    1.10
    j
    1.04
    ни
    0.92
    ä
    0.91
    ne
    0.88
    ati
    0.82
    е
    0.81
    б
    0.81
    0.80
     a
    0.80
    Act Density 0.010%

    No Known Activations