INDEX
    Explanations

    questions starting with "Who" that inquire about identity or roles

    New Auto-Interp
    Negative Logits
    iali
    -0.17
    oyer
    -0.17
    owan
    -0.17
    gles
    -0.16
    ibu
    -0.15
    inqu
    -0.15
    adora
    -0.14
    inox
    -0.14
    ?action
    -0.14
     اÙĦرÙħ
    -0.14
    POSITIVE LOGITS
     else
    0.16
    åĢ«
    0.16
     appointment
    0.14
    .pair
    0.13
    agem
    0.13
    ops
    0.13
    hold
    0.13
     kdo
    0.13
    osh
    0.13
     appointments
    0.13
    Act Density 0.083%

    No Known Activations