INDEX
    Explanations

    instances where an action or modification can be made

    phrases indicating capability or possibility

    New Auto-Interp
    Negative Logits
    pires
    -0.68
     Hits
    -0.68
    burgh
    -0.66
     Mant
    -0.65
     Strikes
    -0.65
     Appears
    -0.63
    arthed
    -0.62
     favors
    -0.62
     forthcoming
    -0.62
     Attention
    -0.62
    POSITIVE LOGITS
    't
    1.55
    NOT
    1.16
     choose
    1.00
     customize
    0.99
     find
    0.96
     optionally
    0.95
     expect
    0.94
     learn
    0.91
    berra
    0.91
    ister
    0.91
    Act Density 0.104%

    No Known Activations