INDEX
    Explanations

    performing actions or commands

    New Auto-Interp
    Negative Logits
    Controls
    -0.07
     destroy
    -0.07
     consultations
    -0.07
    acro
    -0.07
    awe
    -0.06
    شناسی
    -0.06
     относится
    -0.06
    macro
    -0.06
    ΟΥΣ
    -0.06
     condo
    -0.06
    POSITIVE LOGITS
     quả
    0.06
    ephy
    0.06
    атків
    0.06
    0.06
    153
    0.06
     VLC
    0.06
     privileges
    0.06
     پزشکی
    0.06
    ModelIndex
    0.06
     계속
    0.06
    Act Density 0.003%

    No Known Activations