INDEX
    Explanations

    words that indicate actions or states of being

    New Auto-Interp
    Negative Logits
     indebted
    -0.16
    shal
    -0.15
    аÑĢам
    -0.15
    paged
    -0.14
     нав
    -0.14
    hl
    -0.14
    дал
    -0.14
     ÄIJông
    -0.14
    exampleInput
    -0.14
     fac
    -0.14
    POSITIVE LOGITS
    ibi
    0.14
     Cul
    0.14
     nackte
    0.14
     Samp
    0.14
    .uni
    0.14
    atoi
    0.14
     capped
    0.14
    ovit
    0.14
    enda
    0.13
     *(*
    0.13
    Act Density 0.002%

    No Known Activations