INDEX
    Explanations

    terms indicating exceptions or alternative conditions

    New Auto-Interp
    Negative Logits
    oa
    -0.16
     Fare
    -0.15
    elian
    -0.15
    isoner
    -0.15
    важа
    -0.14
    abee
    -0.14
    itto
    -0.13
    agara
    -0.13
    ãģŁãģĹ
    -0.13
    edo
    -0.13
    POSITIVE LOGITS
    uder
    0.14
     instead
    0.14
    ewise
    0.14
    gra
    0.14
    ugh
    0.14
    aggi
    0.13
     Gale
    0.13
    Anc
    0.13
     Baghd
    0.13
    çĵľ
    0.13
    Act Density 0.029%

    No Known Activations