INDEX
    Explanations

    modal verbs indicating possibility or future actions

    New Auto-Interp
    Negative Logits
     favor
    -0.15
    eger
    -0.15
    ennie
    -0.14
    ect
    -0.14
    .sigmoid
    -0.14
     Fitz
    -0.14
    foy
    -0.14
     Hutch
    -0.14
    æºĸ
    -0.13
    pes
    -0.13
    POSITIVE LOGITS
     bare
    0.15
    è¼Ŀ
    0.14
    ture
    0.14
    ç¿°
    0.14
     hei
    0.14
    akan
    0.14
    vise
    0.14
     cuent
    0.14
    heim
    0.14
    iface
    0.13
    Act Density 0.000%

    No Known Activations