INDEX
    Explanations

    modal verbs indicating ability or permission

    New Auto-Interp
    Negative Logits
    atively
    -0.18
    ically
    -0.17
    ought
    -0.16
    ucz
    -0.15
     themselves
    -0.15
    unday
    -0.14
     itself
    -0.14
     bilin
    -0.14
    ék
    -0.14
    ughters
    -0.14
    POSITIVE LOGITS
     expect
    0.27
     always
    0.26
     bet
    0.25
    expect
    0.22
     Bet
    0.21
     either
    0.21
    always
    0.21
     Always
    0.20
     imagine
    0.20
    Always
    0.19
    Act Density 0.119%

    No Known Activations