INDEX
    Explanations

    Articles (a, the)

    New Auto-Interp
    Negative Logits
    jd
    -0.07
    िर
    -0.07
     concentration
    -0.07
    kh
    -0.07
     Schmidt
    -0.07
    ica
    -0.07
    OWL
    -0.07
    atk
    -0.07
     suburbs
    -0.06
    htar
    -0.06
    POSITIVE LOGITS
    EB
    0.06
     Παρα
    0.06
    اورزی
    0.06
     DONE
    0.06
    SAN
    0.06
     грав
    0.06
    GINE
    0.06
     이용
    0.06
    .teacher
    0.06
    eně
    0.06
    Act Density 0.033%

    No Known Activations