INDEX
    Explanations

    nuclear, button, picture, delivery, good, time, important

    New Auto-Interp
    Negative Logits
    ́
    0.91
    ckiego
    0.73
    ğunu
    0.73
    ռ
    0.70
    лке
    0.69
    rk
    0.69
    вшего
    0.69
    llo
    0.68
    wner
    0.68
    js
    0.68
    POSITIVE LOGITS
    ные
    1.55
    ный
    1.45
    ным
    1.42
    ная
    1.38
    ное
    1.35
    ъ
    1.34
    ность
    1.31
    ья
    1.31
    ной
    1.30
    ными
    1.29
    Act Density 0.072%

    No Known Activations