INDEX
    Explanations

    verbs and phrases related to decision-making and actions taken

    New Auto-Interp
    Negative Logits
    anners
    -0.17
     LENG
    -0.17
    lav
    -0.15
     Mane
    -0.15
    TU
    -0.14
    urm
    -0.14
    anner
    -0.13
     pac
    -0.13
    lation
    -0.13
     Laur
    -0.13
    POSITIVE LOGITS
    åīĽ
    0.15
    295
    0.15
     instead
    0.15
    iline
    0.14
    wise
    0.14
    instead
    0.14
    uka
    0.14
    IR
    0.14
     rather
    0.14
    SCO
    0.13
    Act Density 0.136%

    No Known Activations