INDEX
    Explanations

    phrases expressing reluctance or inability in the context of responsibility or action

    New Auto-Interp
    Negative Logits
    achuset
    -0.16
    ulton
    -0.15
     Dame
    -0.14
     astr
    -0.14
    ovice
    -0.14
    istle
    -0.14
    ictim
    -0.14
    ifle
    -0.13
     Begin
    -0.13
     tome
    -0.13
    POSITIVE LOGITS
     get
    0.23
     figure
    0.19
     gets
    0.18
    get
    0.17
     done
    0.17
    doing
    0.16
     getting
    0.16
    çłĤ
    0.16
    said
    0.16
     Get
    0.16
    Act Density 1.593%

    No Known Activations