INDEX
    Explanations

    verbs indicating actions or occurrences that impact outcomes

    New Auto-Interp
    Negative Logits
    unner
    -0.15
    ime
    -0.15
    -part
    -0.15
    BOR
    -0.15
    bor
    -0.15
     bor
    -0.15
     partner
    -0.14
    ropdown
    -0.14
    plier
    -0.14
     Matters
    -0.14
    POSITIVE LOGITS
    ycastle
    0.16
    Ģ
    0.15
    YD
    0.15
    storybook
    0.14
    ctp
    0.14
    elly
    0.13
    ingt
    0.13
    moth
    0.13
     LOSS
    0.13
    awaii
    0.13
    Act Density 0.365%

    No Known Activations