INDEX
    Explanations

    phrases indicating intentions or desires related to actions

    New Auto-Interp
    Negative Logits
    jk
    -0.15
     Butler
    -0.14
    raid
    -0.14
    if
    -0.14
    acht
    -0.14
    vido
    -0.13
    /kernel
    -0.13
    fter
    -0.13
    ife
    -0.13
     Gonzalez
    -0.13
    POSITIVE LOGITS
     know
    0.18
    лиÑĤ
    0.15
    @class
    0.15
    orz
    0.15
    ektor
    0.14
    ondo
    0.14
    omu
    0.14
     Sto
    0.14
    jac
    0.14
     Ziel
    0.14
    Act Density 0.087%

    No Known Activations